Reputation: 327
The vision system is given a single training image (e.g. a piece of 2D artwork ) and it is asked whether the piece of artwork is present in the newly captured photos. The newly captured photos can contain a lot of any other object and when the artwork is presented, it must face up but may be occluded.
The pose space is x,y,rotation and scale. The artwork may be highly symmetric or not.
What is the latest state of the art handling this kind of problem?
I have tried/considered the following options but there are some problems in all of them. If my argument is invalid please correct me.
deep learning(rcnn/yolo): a lot of labeled data are needed means a lot of human labor is needed for each new pieces of artwork.
traditional machine learning(SVM,Random forest): same as above
sift/surf/orb + ransac or voting: when the artwork is symmetric, the features matched are mostly incorrect. A lot of time is needed in the ransac/voting stage.
generalized hough transform: the state space is too large for the voting table. Pyramid can be applied but it is difficult to choose some universal thresholds for different kinds of artwork to proceed down the pyramid.
chamfer matching: the state space is too large. Too much time is needed in searching across the state space.
Upvotes: 1
Views: 219
Reputation: 2854
You can try using traditional feature based image processing algorithm which might give true positive template matches up to a descent accuracy.
Given template image as in the question:
We have 2 features now.
Scene image feature vector:
Similarly Again in the scene image use dilation followed by connected components identification, define convex hull(polygon) around each connected objects and define feature vector for each object(edge info, pixel density).
Now as usual search for template feature vector in the scene image feature vectors data with minimum feature distance(also use certain upper level distance threshold value to avoid false positive matches).
This should give the true positive matches if available in the scene image.
Exception: This method would not work for occluded objects.
Upvotes: 0
Reputation: 733
Object detection requires a lot of labeled data of the same class to generalize well, and in your setting it would be impossible to train a network with only single instance.
I assume that in your case online object trackers can work, at least give it a try. There are some convolutional object trackers that work great like Siamese CNNs. The code is open source at github, and you can watch this video to see its performance.
Online object tracking: Given the initialized state (e.g., position and size) of a target object in a frame of a video, the goal of tracking is to estimate the states of the target in the subsequent frames.-source-
Upvotes: 1