OpenCV and Augmented Reality

Homework 3 for CS 6476 - Computer Vision focused on corner detection, template matching, homography transformations, and projective geometry. This was a really fun assignment that generated some great output videos. Because these homework assignments are still being used, I’ll be brief with my code snippets.

Marker Detection in all Conditions

Using OpenCV template detection, the AR markers were identified. Because eventually the markers would have to be identified in various scales and at various angles, the center of the template was extracted so that we were only searching for the black/white cross region. A method was written to rotate the template through a variety of angles and choose only the highest match that doesn’t overlap with other markers.

Applying this template matching method to a video and the algorithm can accurately track the AR markers at a variety of angles and through different zoom levels. In addition, after the initial markers were found, processing time was boosted by only searching in a region near the last found marker.

And with a lot of noise, we still get really stable results:

Perspective Transformations

Once the corners can be reliable detected in noisy videos, we can use the points to do some fun stuff with perspective transformations. If we assume that the markers are located in a square-ish pattern, we can use projective geometry to insert an image into the area that the markers define. This will create the illusion that the image is flat against the wall and will persist through different viewing angles.

Given two lists of corner points, we generate the homography matrix that will transform the source image points to the destination points. You should really just use cv2.findHomography or cv2.getPerspectiveTransform but as this is a college class, we’ll do it by hand:

def find_homography_matrix(src_points, dst_points):
    """Solves for and returns a perspective transform.
        src_points (list): List of four (x,y) source points.
        dst_points (list): List of four (x,y) destination points.
        numpy.array: 3 by 3 homography matrix of floating point values.
    src = np.array(src_points,dtype=np.float32)
    markers = np.array(dst_points,dtype=np.float32)
    x, y = src[:,0], src[:,1]
    x_m, y_m = markers[:,0], markers[:,1]
    zero = np.zeros_like(x)
    one = np.ones_like(x)
    matrix = matrix.reshape(8,9)
    _, _, H = np.linalg.svd(matrix)

    #Normalize and reshape
    H = H[-1:]/H[-1,-1]
    H = H.reshape(3,3)
    return H

Once all the legwork is done, you can do fun things like paste a video into another video:

I found this assignment really engaging and learned about perspective transformations and homography. I also learned that OpenCV has a heck of a time writing and reading video files and will often output corrupt junk that will crash your media player.