Computer vision with python 3 pdf download






















Conversely, an image may be passed to the write method of the VideoWriter class, which appends the image to the ile in VideoWriter. VideoCapture 'MyInputVid. VideoWriter 'MyOutputVid. The video's ilename must be speciied. Any preexisting ile with that name is overwritten. A video codec must also be speciied. The available codecs may vary from system to system. This encoding is widely compatible but produces large iles.

The ile extension should be avi. The ile extension should be ogv. The ile extension should be flv. A frame rate and frame size must be speciied, too. Since we are copying from another video, these properties can be read from our get method of the VideoCapture class.

Capturing camera frames A stream of camera frames is represented by the VideoCapture class, too. However, for a camera, we construct a VideoCapture class by passing the camera's device index instead of a video's ilename.

For the purpose of creating an appropriate VideoWriter class for the camera, we have to either make an assumption about the frame rate as we did in the code previously or measure it using a timer. The latter approach is better and we will cover it later in this chapter. The number of cameras and their ordering is of course system-dependent. Unfortunately, OpenCV does not provide any means of querying the number of cameras or their properties.

If an invalid index is used to construct a VideoCapture class, the VideoCapture class will not yield any frames; its read method will return false, None. The read method is inappropriate when we need to synchronize a set of cameras or a multi-head camera such as a stereo camera or a Kinect.

Then, we use the grab and retrieve methods instead. Displaying camera frames in a window OpenCV allows named windows to be created, redrawn, and destroyed using the namedWindow , imshow , and destroyWindow functions. Also, any window may capture keyboard input via the waitKey function and mouse input via the setMouseCallback function. VideoCapture 0 cv2. Click window or press any key to stop. For example, ord 'a' returns The mouse callback passed to setMouseCallback should take ive arguments, as seen in our code sample.

The callback's param argument is set as an optional third argument to setMouseCallback. By default, it is 0. For example, we cannot stop our application when the window's close button is clicked. Due to OpenCV's limited event handling and GUI capabilities, many developers prefer to integrate it with another application framework. Later in this chapter, we will design an abstraction layer to help integrate OpenCV into any application framework. Project concept OpenCV is often studied through a cookbook approach that covers a lot of algorithms but nothing about high-level application development.

To an extent, this approach is understandable because OpenCV's potential applications are so diverse. Across such different use cases, can we truly study a useful set of abstractions? I believe we can and the sooner we start creating abstractions, the better.

We will structure our study of OpenCV around a single application, but, at each step, we will design a component of this application to be extensible and reusable. This type of application covers a broad range of OpenCV's functionality and challenges us to create an eficient, effective implementation. Users would immediately notice laws, such as a low frame rate or inaccurate tracking. To get the best results, we will try several approaches using conventional imaging and depth imaging.

Speciically, our application will perform real-time facial merging. Given two streams of camera input or, optionally, prerecorded video input , the application will superimpose faces from one stream atop faces in the other. Filters and distortions will be applied to give the blended scene a uniied look and feel. Users should have the experience of being engaged in a live performance where they enter another environment and another persona.

This type of user experience is popular in amusement parks such as Disneyland. We will call our application Cameo. A cameo is in jewelry a small portrait of a person or in ilm a very brief role played by a celebrity. An object-oriented design Python applications can be written in a purely procedural style. However, from now on, we will use an object-oriented style because it promotes modularity and extensibility.

No matter how we obtain a stream of images or where we send it as output, we can apply the same application-speciic logic to each frame in this stream. Our application code may use a CaptureManager to read new frames and, optionally, to dispatch each frame to one or more outputs, including a still image ile, a video ile, and a window via a WindowManager class.

A WindowManager class lets our application code handle a window and events in an object-oriented style. Both CaptureManager and WindowManager are extensible. CaptureManager As we have seen, OpenCV can capture, show, and record a stream of images from either a video ile or a camera, but there are some special considerations in each case.

Our CaptureManager class abstracts some of the differences and provides a higher-level interface for dispatching images from the capture stream to one or more outputs—a still image ile, a video ile, or a window. A CaptureManager class is initialized with a VideoCapture class and has the enterFrame and exitFrame methods that should typically be called on every iteration of an application's main loop.

Between a call to enterFrame and a call to exitFrame , the application may any number of times set a channel property and get a frame property. The channel property is initially 0 and only multi-head cameras use other values. The frame property is an image corresponding to the current channel's state when enterFrame was called.

Actual ile writing is postponed until exitFrame. Also during the exitFrame method, the frame property may be shown in a window, depending on whether the application code provides a WindowManager class either as an argument to the constructor of CaptureManager or by setting a property, previewWindowManager. If the application code manipulates frame, the manipulations are relected in any recorded iles and in the window. A CaptureManager class has a constructor argument and a property called shouldMirrorPreview, which should be True if we want frame to be mirrored horizontally lipped in the window but not in recorded iles.

Typically, when facing a camera, users prefer the live camera feed to be mirrored. Recall that a VideoWriter class needs a frame rate, but OpenCV does not provide any way to get an accurate frame rate for a camera. The CaptureManager class works around this limitation by using a frame counter and Python's standard time. This approach is not foolproof. Depending on frame rate luctuations and the system-dependent implementation of time.

However, if we are deploying to unknown hardware, it is better than just assuming that the user's camera has a particular frame rate. The implementation turns out to be quite long. So, we will look at it in several pieces. These non-public variables relate to the state of the current frame and any ile writing operations.

As previously discussed, application code only needs to conigure a few things, which are implemented as constructor arguments and settable public properties: the camera channel, the window manager, and the option to mirror the camera preview.

By convention, in Python, variables that are preixed with a single underscore should be treated as protected accessed only within the class and its subclasses , while variables that are preixed with a double underscore should be treated as private accessed only within the class.

Continuing with our implementation, let's add the enterFrame and exitFrame methods to managers. Write to files. Release the frame. The getter may retrieve and cache the frame.

The implementation of exitFrame takes the image from the current channel, estimates a frame rate, shows the image via the window manager if any , and fulills any pending requests to write the image to iles. To inish our class implementation, let's add the remaining ile-writing methods to managers. VideoWriter self. However, in situations where the frame rate is unknown, we skip some frames at the start of the capture session so that we have time to build up an estimate of the frame rate.

Although our current implementation of CaptureManager relies on VideoCapture, we could make other implementations that do not use OpenCV for input. For example, we could make a subclass that was instantiated with a socket connection, whose byte stream could be parsed as a stream of images.

Also, we could make a subclass that used a third-party camera library with different hardware support than what OpenCV provides. However, for Cameo, our current implementation is suficient. Abstracting a window and keyboard — managers. WindowManager As we have seen, OpenCV provides functions that cause a window to be created, be destroyed, show an image, and process events.

Rather than being methods of a window class, these functions require a window's name to pass as an argument. Since this interface is not object-oriented, it is inconsistent with OpenCV's general style. Also, it is unlikely to be compatible with other window or event handling interfaces that we might eventually want to use instead of OpenCV's. For the sake of object-orientation and adaptability, we abstract this functionality into a WindowManager class with the createWindow , destroyWindow , show , and processEvents methods.

As a property, a WindowManager class has a function object called keypressCallback, which if not None is called from processEvents in response to any key press. However, we could modify WindowManager to support mouse events too. For example, the class's interface could be expanded to include a mouseCallback property and optional constructor argument but could otherwise remain the same.

With some event framework other than OpenCV's, we could support additional event types in the same way, by adding callback properties. This implementation improves on the base WindowManager class by properly handling quit events—for example, when the user clicks on the window's close button. Potentially, many other event types can be handled via Pygame too. Applying everything — cameo. Cameo Our application is represented by a class, Cameo, with two methods: run and onKeypress.

On initialization, a Cameo class creates a WindowManager class with onKeypress as a callback, as well as a CaptureManager class using a camera and the WindowManager class.

When run is called, the application executes a main loop in which frames and events are processed. As a result of event processing, onKeypress may be called. In the same directory as managers. VideoCapture 0 , self. This is the intended behavior, as we pass True for shouldMirrorPreview when initializing the CaptureManager class. So far, we do not manipulate the frames in any way except to mirror them for preview.

We will start to add more interesting effects in Chapter 3, Filtering Images. Summary By now, we should have an application that displays a camera feed, listens for keyboard input, and on command records a screenshot or screencast. We are ready to extend the application by inserting some image-iltering code Chapter 3, Filtering Images between the start and end of each frame. Optionally, we are also ready to integrate other camera drivers or other application frameworks Appendix A, Integrating with Pygame , besides the ones supported by OpenCV.

Our goal is to achieve artistic effects, similar to the ilters that can be found in image editing applications, such as Photoshop or Gimp. As we proceed with implementing ilters, you can try applying them to any BGR image and then saving or displaying the result.

To fully appreciate each effect, try it with various lighting conditions and subjects. By the end of this chapter, we will integrate ilters into the Cameo application. Thus, we should separate the ilters into their own Python module or ile. Let's create a ile called filters. We need the following import statements in filters. It should contain the following import statements: import cv2 import numpy import scipy.

Channel mixing — seeing in Technicolor Channel mixing is a simple technique for remapping colors. The color at a destination pixel is a function of the color at the corresponding source pixel only. More speciically, each channel's value at the destination pixel is a function of any or all channels' values at the source pixel. In pseudocode, for a BGR image: dst. Potentially, we can map a scene's colors much differently than a camera normally does or our eyes normally do.

By assigning equal values to any two channels, we can collapse part of the color space and create the impression that our palette is based on just two colors of light blended additively or two inks blended subtractively. This type of effect can offer nostalgic value because early color ilms and early digital graphics had more limited palettes than digital graphics today. As examples, let's invent some notional color spaces that are reminiscent of Technicolor movies of the s and CGA graphics of the s.

So we need to specify value or whiteness as well. This color space resembles Technicolor Process 1. This color space resembles CGA Palette 1. For color images, see the electronic edition of this book. Blue and green can mix to make cyan. By averaging the B and G channels and storing the result in both B and G, we effectively collapse these two channels into one, C. To support this effect, let's add the following function to filters. The source and destination images must both be in BGR format.

Blues and greens are replaced with cyans. Pseudocode: dst. Using split , we extract our source image's channels as one-dimensional arrays. Having put the data in this format, we can write clear, simple channel mixing code.

Using addWeighted , we replace the B channel's values with an average of B and G. The arguments to addWeighted are in order the irst source array, a weight applied to the irst source array, the second source array, a weight applied to the second source array, a constant added to the result, and a destination array. Using merge , we replace the values in our destination image with the modiied channels.

Note that we use b twice as an argument because we want the destination's B and G channels to be equal. Similar steps—splitting, modifying, and merging channels—can be applied to our other color space simulations as well.

Our intuition might say that we should set all B-channel values to 0 because RGV cannot represent blue. However, this change would be wrong because it would discard the blue component of lightness and, thus, turn grays and pale blues into yellows. Instead, we want grays to remain gray while pale blues become gray. To achieve this result, we should reduce B values to the per-pixel minimum of B, G, and R. Let's implement this effect in filters. Blues are desaturated.

To desaturate yellows, we should increase B values to the per-pixel maximum of B, G, and R. Here is an implementation that we can add to filters. Yellows are desaturated. By design, the three preceding effects tend to produce major color distortions, especially when the source image is colorful in the irst place. If we want to craft subtle effects, channel mixing with arbitrary functions is probably not the best approach. Curves — bending color space Curves are another technique for remapping colors.

Channel mixing and curves are similar insofar as the color at a destination pixel is a function of the color at the corresponding source pixel only. However, in the speciics, channel mixing and curves are dissimilar approaches. With curves, a channel's value at a destination pixel is a function of only the same channel's value at the source pixel. Moreover, we do not deine the functions directly; instead, for each function, we deine a set of control points from which the function is interpolated.

We will use cubic spline interpolation whenever the number of control points is suficient. Most of this work is done for us by a SciPy function called interp1d , which takes two arrays x and y coordinates and returns a function that interpolates the points. As an optional argument to interp1d , we may specify a kind of interpolation, which, in principle, may be linear, nearest, zero, slinear spherical linear , quadratic, or cubic, though not all options are implemented in the current version of SciPy.

Let's edit utils. The array must be ordered such that x increases from one index to the next. Typically, for natural-looking effects, the y values should increase too, and the irst and last control points should be 0, 0 and , in order to preserve black and white. Note that we will treat x as a channel's input value and y as the corresponding output value.

For example, , would brighten a channel's midtones. Note that cubic interpolation requires at least four control points. If there are only two or three control points, we fall back to linear interpolation but, for natural-looking effects, this case should be avoided.

However, this function might be expensive. We do not want to run it once per channel, per pixel for example, , times per frame if applied to three channels of x video. Fortunately, we are typically dealing with just possible input values in 8 bits per channel and we can cheaply precompute and store that many output values. Then, our per-channel, per-pixel cost is just a lookup of the cached output value.

The lookup values are clamped to [0, length - 1]. The applyLookupArray function works by using a source array's values as indices into the lookup array. Python's slice notation [:] is used to copy the looked-up values into a destination array. What if we always want to apply two or more curves in succession? Performing multiple lookups is ineficient and may cause loss of precision. We can avoid this problem by combining two curve functions into one function before creating a lookup array.

The arguments must be of compatible types. Note the use of Python's lambda keyword to create an anonymous function. Here is a inal optimization issue. What if we want to apply the same curve to all channels of an image?

Splitting and remerging channels is wasteful, in this case, because we do not need to distinguish between channels. We just need one- dimensional indexing, as used by applyLookupArray. The approach in createFlatView works for images with any number of channels.

Thus, it allows us to abstract the difference between grayscale and color images in cases when we wish to treat all channels the same. Thus, they need to be classes, not just functions. The function is applied to the V value channel of a grayscale image or to all channels of a color image.

Instead of being instantiated with a function, it is instantiated with a set of control points, which it uses internally to create a curve function. One of the functions is applied to all channels and the other three functions are each applied to a single channel. The overall function is applied irst and then the per-channel functions. Instead of being instantiated with four functions, it is instantiated with four sets of control points, which it uses internally to create curve functions.

Additionally, all these classes accept a constructor argument that is a numeric type, such as numpy. This type is used to determine how many entries should be in the lookup array. Let's irst look at the implementations of VFuncFilter and VcurveFilter, which may both be added to filters. We are also using numpy.

We are also using iinfo , split , and merge. These four classes can be used as is, with custom functions or control points being passed as arguments at instantiation. Alternatively, we can make further subclasses that hard-code certain functions or control points.

Such subclasses could be instantiated without any arguments. Emulating photo ilms A common use of curves is to emulate the palettes that were common in pre-digital photography. Every type of photo ilm has its own, unique rendition of color or grays but we can generalize about some of the differences from digital sensors.

Film tends to suffer loss of detail and saturation in shadows, whereas digital tends to suffer these failings in highlights. Also, ilm tends to have uneven saturation across different parts of the spectrum. So each ilm has certain colors that pop or jump out. Thus, when we think of good-looking ilm photos, we may think of scenes or renditions that are bright and that have certain dominant colors. At the other extreme, we may remember the murky look of underexposed ilm that could not be improved much by the efforts of the lab technician.

We are going to create four different ilm-like ilters using curves. We just override the constructor to specify a set of control points for each channel. The choice of control points is based on recommendations by photographer Petteri Sulonen. The Portra, Provia, and Velvia effects should produce normal-looking images.

The effect should not be obvious except in before-and-after comparisons. As a portrait ilm, it tends to make people's complexions fairer. Also, it exaggerates certain common clothing colors, such as milky white for example, a wedding dress and dark blue for example, a suit or jeans. Let's add this implementation of a Portra ilter to filters. Sky, water, and shade are enhanced more than sun. Let's add this implementation of a Provia ilter to filters.

It can often produce azure skies in daytime and crimson clouds at sunset. The effect is dificult to emulate but here is an attempt that we can add to filters. Black and white are not necessarily preserved. Also, contrast is very high. Cross-processed photos take on a sickly appearance. People look jaundiced, while inanimate objects look stained. Let's edit filters. We, as humans, can easily recognize many object types and their pose just by seeing a backlit silhouette or a rough sketch.

Indeed, when art emphasizes edges and pose, it often seems to convey the idea of an archetype, like Rodin's The Thinker or Joe Shuster's Superman. Software, too, can reason about edges, poses, and archetypes. We will discuss these kinds of reasoning in later chapters. For the moment, we are interested in a simple use of edges for artistic effect. We are going to trace an image's edges with bold, black lines. The effect should be reminiscent of a comic book or other illustration, drawn with a felt pen.

These ilters are supposed to turn non-edge regions to black while turning edge regions to white or saturated colors. However, they are prone to misidentifying noise as edges. This law can be mitigated by blurring an image before trying to ind its edges. OpenCV also provides many blurring ilters, including blur simple average , medianBlur , and GaussianBlur. The arguments to the edge-inding and blurring ilters vary but always include ksize, an odd whole number that represents the width and height in pixels of the ilter's kernel.

A kernel is a set of weights that are applied to a region in the source image to generate a single pixel in the destination image. For example, a ksize of 7 implies that 49 7 x 7 source pixels are considered in generating each destination pixel. We can think of a kernel as a piece of frosted glass moving over the source image and letting through a diffused blend of the source's light.

For blurring, let's use medianBlur , which is effective in removing digital video noise, especially in color images. For edge-inding, let's use Laplacian , which produces bold edge lines, especially in grayscale images. Once we have the result of Laplacian , we can invert it to get black edges on a white background. Then, we can normalize it so that its values range from 0 to 1 and multiply it with the source image to darken the edges. Let's implement this approach in filters.

Laplacian graySrc, cv2. The blurKsize argument is used as ksize for medianBlur , while edgeKsize is used as ksize for Laplacian. With my webcams, I ind that a blurKsize value of 7 and edgeKsize value of 5 look best. Unfortunately, medianBlur is expensive with a large ksize like 7. If you encounter performance problems when running strokeEdges , try decreasing the blurKsize value. To turn off blur, set it to a value less than 3.

Custom kernels — getting convoluted As we have just seen, many of OpenCV's predeined ilters use a kernel. Remember that a kernel is a set of weights, which determine how each output pixel is calculated from a neighborhood of input pixels. Another term for a kernel is a convolution matrix. It mixes up or convolutes the pixels in a region. Similarly, a kernel-based ilter may be called a convolution ilter.

OpenCV provides a very versatile function, filter2D , which applies any kernel or convolution matrix that we specify. To understand how to use this function, let's irst learn the format of a convolution matrix. It is a 2D array with an odd number of rows and columns. The central element corresponds to a pixel of interest and the other elements correspond to that pixel's neighbors. Each element contains an integer or loating point value, which is a weight that gets applied to an input pixel's value.

For the pixel of interest, the output color will be nine times its input color, minus the input colors of all eight adjacent pixels. If the pixel of interest was already a bit different from its neighbors, this difference becomes intensiied. The effect is that the image looks sharper as the contrast between neighbors is increased.

A negative value as used here means that the destination image has the same depth as the source image. For color images, note that filter2D applies the kernel equally to each channel. To use different kernels on different channels, we would also have to use the split and merge functions, as we did in our earlier channel mixing functions. See the section Simulating RC color space. Based on this simple example, let's add two classes to filters. One class, VConvolutionFilter, will represent a convolution ilter in general.

A subclass, SharpenFilter, will represent our sharpening ilter speciically. See the section Designing object-oriented curve ilters. This should be the case whenever we want to leave the image's overall brightness unchanged. If we modify a sharpening kernel slightly, so that its weights sum to 0 instead, then we have an edge detection kernel that turns edges white and non-edges black.

For example, let's add the following edge detection ilter to filters. Generally, for a blur effect, the weights should sum to 1 and should be positive throughout the neighborhood. For example, we can take a simple average of the neighborhood, as follows: class BlurFilter VConvolutionFilter : """A blur filter with a 2-pixel radius.

Sometimes, though, kernels with less symmetry produce an interesting effect. Let's consider a kernel that blurs on one side with positive weights and sharpens on the other with negative weights. It will produce a ridged or embossed effect. Indeed, it is more basic than OpenCV's ready-made set of ilters. However, with a bit of experimentation, you should be able to write your own kernels that produce a unique look. Modifying the application Now that we have high-level functions and classes for several ilters, it is trivial to apply any of them to the captured frames in Cameo.

Let's edit cameo. The rest is the same as in Chapter 2. Here, I have chosen to apply two effects: stroking the edges and emulating Portra ilm colors. Feel free to modify the code to apply any ilters you like. We should also have several more ilter implementations that are easily swappable with the ones we are currently using.

Now, we are ready to proceed with analyzing each frame for the sake of inding faces to manipulate in the next chapter. Speciically, we look at Haar cascade classiiers, which analyze contrast between adjacent image regions to determine whether or not a given image or subimage matches a known type.

We consider how to combine multiple Haar cascade classiiers in a hierarchy, such that one classiier identiies a parent region for our purposes, a face and other classiiers identify child regions eyes, nose, and mouth. We also take a detour into the humble but important subject of rectangles.

By drawing, copying, and resizing rectangular image regions, we can perform simple manipulations on image regions that we are tracking. By the end of this chapter, we will integrate face tracking and rectangle manipulations into Cameo. Finally, we'll have some face-to-face interaction! Tracking Faces with Haar Cascades Conceptualizing Haar cascades When we talk about classifying objects and tracking their location, what exactly are we hoping to pinpoint?

What constitutes a recognizable part of an object? Photographic images, even from a webcam, may contain a lot of detail for our human viewing pleasure.

However, image detail tends to be unstable with respect to variations in lighting, viewing angle, viewing distance, camera shake, and digital noise. Moreover, even real differences in physical detail might not interest us for the purpose of classiication. I was taught in school, that no two snowlakes look alike under a microscope. Fortunately, as a Canadian child, I had already learned how to recognize snowlakes without a microscope, as the similarities are more obvious in bulk.

Thus, some means of abstracting image detail is useful in producing stable classiication and tracking results. The abstractions are called features, which are said to be extracted from the image data.

There should be far fewer features than pixels, though any pixel might inluence multiple features. The level of similarity between two images can be evaluated based on distances between the images' corresponding features. For example, distance might be deined in terms of spatial coordinates or color coordinates.

Haar-like features are one type of feature that is often applied to real-time face tracking. They were irst used for this purpose by Paul Viola and Michael Jones in Each Haar-like feature describes the pattern of contrast among adjacent image regions.

For example, edges, vertices, and thin lines each generate distinctive features. For any given image, the features may vary depending on the regions' size, which may be called the window size. Two images that differ only in scale should be capable of yielding similar features, albeit for different window sizes. Thus, it is useful to generate features for multiple window sizes. Such a collection of features is called a cascade.

We may say a Haar cascade is scale-invariant or, in other words, robust to changes in scale. OpenCV provides a classiier and tracker for scale-invariant Haar cascades, which it expects to be in a certain ile format.

Haar cascades, as implemented in OpenCV, are not robust to changes in rotation. For example, an upside-down face is not considered similar to an upright face and a face viewed in proile is not considered similar to a face viewed from the front. A more complex and more resource-intensive implementation could improve Haar cascades' robustness to rotation by considering multiple transformations of images as well as multiple window sizes.

However, we will conine ourselves to the implementation in OpenCV. It contains cascades that are trained for certain subjects using tools that come with OpenCV. Once you ind haarcascades, create a directory called cascades in the same folder as cameo. They require a frontal, upright view of the subject. We will use them later when building a high-level tracker.

With a lot of patience and a powerful computer, you can make your own cascades, trained for various types of objects. Let's make new modules for tracking classes and their helpers. A ile called trackers. Let's put the following import statements at the start of trackers. Deining a face as a hierarchy of rectangles Before we start implementing a high-level tracker, we should deine the type of tracking result that we want to get.

For many applications, it is important to estimate how objects are posed in real, 3D space. However, our application is about image manipulation. So we care more about 2D image space. An upright, frontal view of a face should occupy a roughly rectangular region in the image. Within such a region, eyes, a nose, and a mouth should occupy rough rectangular subregions. Let's open trackers. OpenCV sometimes uses a compatible representation but not always.

For example, sometimes OpenCV requires the upper-left and lower-right corners as coordinate pairs. Tracing, cutting, and pasting rectangles When I was in primary school, I was poor at crafts. I often had to take my uninished craft projects home, where my mother volunteered to inish them for me so that I could spend more time on the computer instead.

I shall never cut and paste a sheet of paper, nor an array of bytes, without thinking of those days. Just as in crafts, mistakes in our graphics program are easier to see if we irst draw outlines.

For debugging purposes, Cameo will include an option to draw lines around any rectangles represented by a Face. OpenCV provides a rectangle function for drawing. However, its arguments represent a rectangle differently than Face does.

For convenience, let's add the following wrapper of rectangle to rects. Next, Cameo must support copying one rectangle's contents into another rectangle. We can read or write a rectangle within an image by using Python's slice notation. For copying, a complication arises if the source and destination of rectangles are of different sizes. Certainly, we expect two faces to appear at different sizes, so we must address this case. OpenCV provides a resize function that allows us to specify a destination size and an interpolation method.

Combining slicing and resizing, we can add the following implementation of a copy function to rects. Put the result in the destination sub-rectangle. Consider the following approach, which is wrong: copyRect image, image, rect0, rect1 overwrite rect1 copyRect image, image, rect1, rect0 copy from rect1 Oops!

Instead, we need to copy one of the rectangles to a temporary array before overwriting anything. Let's edit rects. Each rectangle's content is destined for the next rectangle, except that the last rectangle's content is destined for the irst rectangle.

This approach should serve us well enough for Cameo, but it is still not entirely foolproof. Intuition might tell us that the following code should leave image unchanged: swapRects image, image, rect0, rect1 swapRects image, image, rect1, rect0 However, if rect0 and rect1 overlap, our intuition may be incorrect.

If you see strange-looking results, then investigate the possibility that you are swapping overlapping rectangles. Adding more utility functions Last chapter, we created a module called utils for some miscellaneous helper functions. A couple of extra helper functions will make it easier for us to write a tracker. First, it may be useful to know whether an image is in grayscale or color. We can tell based on the dimensionality of the image.

Color images are 3D arrays, while grayscale images have fewer dimensions. Let's add the following function to utils. An image's or other array's height and width, respectively, are the irst two entries in its shape property. Tracking faces The challenge in using OpenCV's Haar cascade classiiers is not just getting a tracking result; it is getting a series of sensible tracking results at a high frame rate.

One kind of common sense that we can enforce is that certain tracked objects should have a hierarchical relationship, one being located relative to the other. For example, a nose should be in the middle of a face.

By attempting to track both a whole face and parts of a face, we can enable application code to do more detailed manipulations and to check how good a given tracking result is.

A face with a nose is a better result than one without. At the same time, we can support some optimizations, such as only looking for faces of a certain size and noses in certain places. We are going to implement an optimized, hierarchical tracker in a class called FaceTracker, which offers a simple interface. A FaceTracker may be initialized with certain optional coniguration arguments that are relevant to the tradeoff between tracking accuracy and performance. At any given time, the latest tracking results of FaceTracker are stored in a property called faces, which is a list of Face instances.

Initially, this list is empty. It is refreshed via an update method that accepts an image for the tracker to analyze. Finally, for debugging purposes, the rectangles of faces may be drawn via a drawDebugRects method, which accepts an image as a drawing surface.

Every frame, a real-time face-tracking application would call update , read faces, and perhaps call drawDebugRects. A CascadeClassifier is initialized with a cascade data ile, such as the ones that we found and copied earlier. For our purposes, the important method of CascadeClassifier is detectMultiScale , which performs tracking that may be robust to variations in scale.

It must have 8 bits per channel. A higher value improves performance but diminishes robustness with respect to variations in scale. A match may merge multiple neighboring regions. The default approach is the opposite: scale the feature data to match the window. Scaling the image allows for certain optimizations on modern hardware. This flag must not be combined with others.

This flag should not be combined with cv2. The minNeighbors argument should be greater than 0. A higher value improves performance. A lower value improves performance. The return value of detectMultiScale is a list of matches, each expressed as a rectangle in the format [x, y, w, h]. Similarly, the initializer of FaceTracker accepts scaleFactor, minNeighbors, and flags as arguments. The given values are passed to all detectMultiScale calls that a FaceTracker makes internally.

Also during initialization, a FaceTracker creates CascadeClassifiers using face, eye, nose, and mouth data. Let's add the following implementation of the initializer and the faces property to trackers. Equalization, as implemented in OpenCV's equalizeHist function, normalizes an image's brightness and increases its contrast.

Equalization as a preprocessing step makes our tracker more robust to variations in lighting, while conversion to grayscale improves performance. Next, we feed the preprocessed image to our face classiier. For each matching rectangle, we search certain subregions for a left and right eye, nose, and mouth. Ultimately, the matching rectangles and subrectangles are stored in Face instances in faces.

For each type of tracking, we specify a minimum object size that is proportional to the image size. Our implementation of FaceTracker should continue with the following code for update : def update self, image : """Update the tracked facial features.

The rectangle is the image subregion that the given classiier should search. For example, the nose classiier should search the middle of the face.

Limiting the search area improves performance and helps eliminate false positives. This approach works whether or not we are using the cv2. The following method implementation simply deines colors, iterates over Face instances, and draws rectangles of each Face to a given image using our rects. Modifying the application Let's look at two approaches to integrating face tracking and swapping into Cameo.

The irst approach uses a single camera feed and swaps face rectangles found within this camera feed. The second approach uses two camera feeds and copies face rectangles from one camera feed to the other.

For now, we will limit ourselves to manipulating faces as a whole and not subelements such as eyes. However, you could modify the code to swap only eyes, for example. If you try this, be careful to check that the relevant subrectangles of the face are not None. On initialization of Cameo, we create a FaceTracker and a Boolean variable indicating whether debug rectangles should be drawn for the FaceTracker.

The Boolean is toggled in onKeypress in response to the X key. As part of the main loop in run , we update our FaceTracker with the current frame. Then, the resulting FaceFace objects in the faces property are fetched and their faceRects are swapped using rects.

Also, depending on the Boolean value, we may draw debug rectangles that relect the original positions of facial elements before any swap. Face regions are outlined after the user presses X: The following screenshot is from Cameo. On initialization, a CameoDouble invokes the constructor of Cameo and also creates a second CaptureManager.

During the main loop in run , a CameoDouble gets new frames from both cameras and then gets face tracking results for both frames. Faces are copied from one frame to the other using copyRect. Then, the destination frame is displayed, optionally with debug rectangles drawn overtop it.

We can implement CameoDouble in cameo. Speciically, the application may become deadlocked while waiting for the built-in camera to supply a frame. If you encounter this issue, use two external cameras and do not use the built-in camera. VideoCapture 1 def run self : """Run the main loop.

One version tracks faces in a single camera feed and, when faces are found, swaps them by copying and resizing. The other version tracks faces in two camera feeds and, when faces are found in each, copies and resizes faces from one feed to replace faces in the other. Additionally, in both versions, one camera feed is made visible and effects are applied to it.

These versions of Cameo demonstrate the basic functionality that we proposed two chapters ago. The user can displace his or her face onto another body, and the result can be stylized to give it a more uniied feel.

However, the transplanted faces are still just rectangular cutouts. So far, no effort is made to cut away non-face parts of the rectangle or to align superimposed and underlying components such as eyes. The next chapter examines some more sophisticated techniques for facial blending, particularly using depth vision. As prerequisites, we need a depth camera, such as Microsoft Kinect, and we need to build OpenCV with support for our depth camera.

Creating modules Our code for capturing and manipulating depth-camera data will be reusable outside Cameo. So we should separate it into a new module.

Let's create a ile called depth. We need the following import statement in depth. To support the changes we are going to make, let's add the following import statements to rects. So, let's add the following import statement to Cameo.

Capturing frames from a depth camera Back in Chapter 2, Handling Files, Cameras, and GUIs, we discussed the concept that a computer can have multiple video capture devices and each device can have multiple channels.

We also do not have links that lead to sites DMCA copyright infringement. If You feel that this book is belong to you and you want to unpublish it, Please Contact us.

Computer Vision with Python 3. Download e-Book. Posted on. Page Count. Saurabh Kapur,. Key Features Learn how to build a full-fledged image processing application using free tools and libraries Perform basic to advanced image and video stream processing with OpenCV's Python APIs Understand and optimize various features of OpenCV with the help of easy-to-grasp examples Book Description This book is a thorough guide for developers who want to get started with building computer vision applications using Python 3.

What you will learn Working with open source libraries such Pillow, Scikit-image, and OpenCV Writing programs such as edge detection, color processing, image feature extraction, and more Implementing feature detection algorithms like LBP and ORB Tracking objects using an external camera or a video file Optical Character Recognition using Machine Learning.

Download e-Book Pdf. Related e-Books.



0コメント

  • 1000 / 1000