what is a most efficient way to achieve fast and error-prone object tracking?

Question

after spending some time learning basic computer vision concepts and techniques I started to notice how unreliable simple scripts can get when the luminosity or scale changes and how resource consuming is to use more advanced solutions like creating a well-made HAAR cascade or HOG-feature based svm. Furthermore, some even more advanced methods involving machine learning usually take a lot of time and GPUhours when a high quality model is created.

Recently while looking through YouTube I've found a lot of so called VTubers who use various software to control virtual avatars with somewhat precise motion tracking and what seems to be no errors whatsoever. While not something unimaginable, the amount of people using the software and the amount of software itself seems to be rather large.

Planning to investigate even further I looked into different ways similar technology works, but so far I only found a complex solutions involving either AI driven models or assistance from some sort of positional sensors attached to the body of the user. Still its hard to believe all of those people go through such measures, so I realised that perhaps this is accomplishable with some cv solution which is relatively easy on resource consumption. So far I looked into different ways to "map" model joints to human ones. On my own I tried basic counter matching, and greenscreen filtering to avoid errors. while I successfully managed to remove almost all errors, there still were moments when mapping snapped arm for example to elbow and etc.

How exactly is object recognition and motion tracking of such quality is achieved using only computer vision?

j2abro · Accepted Answer

I'd recommend looking at the OpenCV Tracking API. It implements various tracking algorithms out of the box. Here is a good introduction to object tracking in OpenCV that would be a good starting point. These approaches would be fast and efficient, but that only address the tracking part of your question.

Where the Object Detection (as in AI/ML, so maybe that goes beyond the 'computer vision' component of your question) factors in is identifying the object you want to track in the first place. Object detection would, of course, automate that. Object detection of discrete frames doesn't necessarily associate objects, so for example in video frame 1 you detect a vehicle, then in video frame 2 you also detect a vehicle: is it the same object or different? In this context object detection and tracking can work together to detect and then track objects (associating a unique ID) across frames.

Below is an example from the SORT multi-tracking algorithm, which is a fast and easy to implement tracker that works in conjunction with ML-based object detection:

import cv2
from sort import *

# Get instance of tracking algorithm
tracker = Sort() 

cap = cv2.VideoCapture(0)
while(True):
    # Get frame from video
    ret, video_frame_image = cap.read()

    # Get detections (bounding boxes x1,y1,x1,y2, and scores) 
    # from ML some ml algorithm.
    detections = model.predict(video_frame_image)

    # Get tracking IDs for each bounding box
    tracking_ids = tracker.update(detections)
....

I hope that helps. One other comment that AI/ML models may be efficient enough to achieve what you're trying to achieve, depending on the resource limitations of your use-case. As an example, I've successfully run the PoseNet Model in real-time on a mobile phone without any problem.

what is a most efficient way to achieve fast and error-prone object tracking?

Answers (1)

Related Questions