Reputation: 16841
For the past two weeks I've been trying to figure out how to convert the following image:
To one that looks like this (may not match exactly, as this image was taken at a different time):
The first thing I noticed is that simply slicing the image and overlaying the four parts wouldn't work perfectly, as the curvature of certain lines does not match. For instance, the mid-court line bends left in the second slice and bends right in the third slice. This bending looks like a barrel distortion so I tried using both a parameterized lens correction function (passing k1, k2, and k3 to OpenCV) and using lensfun. Since the lensfun database does not include my camera make or model (it's an AXIS camera) and I do not know the make or model of the lens (it's manufactured as part of the camera), I wrote a small script to dump test images using various lenses with various parameters, then skimmed through the thousands of output images until I found one that looked like it had relatively straight lines:
This correction was done using the "Samyang 12mm f/2.8 Fish-Eye ED AS NCS" lens with a "Canon EOS 10D" camera in lensfun. It's probably not perfect, but I figured it was close enough to move on to step two.
Once the lens distortion was corrected, the second issue is that the same line in two slices was pointing in different directions, which should be corrected with a simple perspective transform. So I began a long quest to figure out the proper parameters for this perspective transform.
I started by writing a cost function to judge the "quality" of a given set of parameters (overlapped pixels should match) and applying SciPy's solver to figure it out. I made several tweaks to my cost function (applying a Gaussian blur, scaling down the image, gray scaling the image, using the Sobel operator to get a gradient, looking only at the pixels on either side of a "seam" after overlapping instead of the whole overlap region, etc) but it always failed to find a good solution. The results looked worse than the original camera image most of the time:
When that failed I tried applying math to compute the proper perspective transform. I know the FOV of the camera (from the spec sheet), I know the image width and height, I know the sensor size (from the spec sheet), and using a protractor I measured the angles between the lenses. Using the pinhole model I then calculated the expected (x,y) values of points on the image plane and what transform would be necessary to correct them. The results looked better than SciPy, but were still dismal.
Stitcher
After this I tried using OpenCV's built-in Stitcher
class. However it failed to stitch together slices 2 and 3 due to insufficient overlap between the images (and about 10% of the time it even failed to stitch together slices 1 and 2, presumably because of the non-deterministic nature of RANSAC). Even when it did succeed, the stitch wasn't that great:
findHomography
Most recently I tried using ORB with a mask (only looking for features in the overlap region) and OpenCV's findHomography
function to create a custom version of the Stitcher. While the matches seemed promising, the resulting stitch was still sub-optimal:
I'm beginning to suspect that my methodology (slice -> lens correct -> perspective transform -> overlay) is flawed and there's a better way to do this.
findHomography
I updated my feature detection to eliminate any matches where the Y coordinates differed drastically (e.g. matching the white of the table to the white of the lights). After doing this my number of matched features fell from ~110 to ~55, but the homography was improved significantly. Here's the stitch that results for slices 1/2 and 2/3 with the update:
Until someone can tell me that I'm going about this all wrong, I'm going to keep pursuing this strategy with the following added step:
Ultimately when it's all said and done I want to try and compute a mapping from input pixels to output pixels so that we're not doing all of this complex work (lens correction, ORB, findHomography, etc) per-frame. We'll do it once per camera, save the mapping to a file somewhere, then we can in real-time map the input video to an output video frame-by-frame using cv2.remap
The second image I posted showing the "expected output" comes directly from the camera in question. It can be configured to return the first image at 30 fps, or the second image at 10 fps. We wish to perform the stitching off-camera on a more powerful computer so we can get 30 fps but still have the single image.
AXIS provides an SDK for doing the stitching off-camera, but this SDK is Windows-only and most of our tech stack is Linux and most of our development machines are Mac OS. I have used a Windows computer to try and look into the stitching SDK they provide, however I had no luck getting it to compile and run. Their sample code kept throwing errors and I've never had any luck getting Visual Studio or C++ to play nicely for me.
Upvotes: 4
Views: 3362
Reputation: 5015
My suggestion is to train an autoencoder. Use the first image as input and the second one as an output, as in a denoising autoencoder:
Note that you may lose resolution if you create a botteleneck too small in the middle layer.
Also, Variational autoencoders present a latent vector but work following the same principle.
You can adapt this code:
denoise = Sequential()
denoise.add(Convolution2D(20, 3,3,
border_mode='valid',
input_shape=input_shape))
denoise.add(BatchNormalization(mode=2))
denoise.add(Activation('relu'))
denoise.add(UpSampling2D(size=(2, 2)))
denoise.add(Convolution2D(20, 3, 3,
init='glorot_uniform'))
denoise.add(BatchNormalization(mode=2))
denoise.add(Activation('relu'))
denoise.add(Convolution2D(20, 3, 3,init='glorot_uniform'))
denoise.add(BatchNormalization(mode=2))
denoise.add(Activation('relu'))
denoise.add(MaxPooling2D(pool_size=(3,3)))
denoise.add(Convolution2D(4, 3, 3,init='glorot_uniform'))
denoise.add(BatchNormalization(mode=2))
denoise.add(Activation('relu'))
denoise.add(Reshape((28,28,1)))
sgd = SGD(lr=learning_rate,momentum=momentum, decay=decay_rate, nesterov=False)
denoise.compile(loss='mean_squared_error', optimizer=sgd,metrics = ['accuracy'])
denoise.summary()
denoise.fit(x_train_noisy, x_train,
nb_epoch=50,
batch_size=30,verbose=1)
Upvotes: 1