Kalman filters for Multiple Object Tracking in videos

Question

From what I've understood, tracking algorithms predict where a given object will be in the next frame (after object detection is already performed). The object is then again recognized in the next frame. What isn't clear is how the tracker then knows to associate the object in the 2nd frame as the same as the one in the 1st, especially when there are multiple objects in the frame.

I've seen in a few places that a cost matrix is created using Euclidean distance between the prediction and all detections, and the problem is framed as an assignment problem (Hungarian algorithm).

Is my understanding of tracking correct? Are there other ways of establishing that an object in one frame is the same as an object in the next frame?

Dima · Accepted Answer

Your understanding is correct. You have described a simple cost function, which is likely to work well in many situations. However, there will be times when it fails.

Assuming you have the computational resources, you can try to make your tracker more robust, by making the cost function more complicated.

The simplest thing you can do is take into account the error covariance of the Kalman filter, rather than just using the Euclidean distance. See the distance equation in the documentation for the vision.KalmanFilter object in MATLAB. Also see the Motion-based Multiple Object Tracking example.

You can also include other information in the cost function. You could account for the fact that the size of the object should not change too much between frames, or that the object's appearance should stay the same. For example, you could compute color histograms of your detections, and define your cost function as a weighted sum of the "Kalman filter distance" and some distance between color histograms.

Kalman filters for Multiple Object Tracking in videos

Answers (1)

Related Questions