Ground-truth data collection and evaluation for computer vision

Currently I am starting to develop a computer vision application that involves tracking of humans. I want to build ground-truth metadata for videos that will be recorded in this project. The metadata will probably need to be hand labeled and will mainly consist of location of the humans in the image. I would like to use the metadata to evaluate the performance of my algorithms.

I could of course build a labeling tool using, e.g. qt and/or opencv, but I was wondering if perhaps there was some kind of defacto standard for this. I came across Viper but it seems dead and it doesn't quite work as easy as I would have hoped. Other than that, I haven't found much.

Does anybody here have some recommendations as to which software / standard / method to use both for the labeling as well as the evaluation? My main preference is to go for something c++ oriented, but this is not a hard constraint.

Kind regards and thanks in advance! Tom

Upvotes: 9

Answers (4)

sweppner

Reputation: 31

I've had the same problem looking for a tool to use for image annotation to build a ground truth data set for training models for image analysis.

LabelMe is a solid option if you need polygonal outlining for your annotation. I've worked with it before and it does the job well and has some additional cool features when it comes to 3d feature extraction. In addition to LabelMe, I also made an open source tool called LabelD. If you're still looking for a tool to do your annotation, check it out!

Upvotes: 1

Goosebumps

Reputation: 939

LabelMe is another open annotation tool. I think it is less suitable for my particular case but still worth mentioning. It seems to be oriented at blob labeling.

Upvotes: 3

Goosebumps

Reputation: 939

I've had another look at vatic and got it to work. It is an online video annotation tool meant for crowd sourcing via a commercial service and it runs on Linux. However, there is also an offline mode. In this mode the service used for the exploitation of this software is not required and the software runs stand alone.

The installation is quite elaborately described in the enclosed README file. It involves, amongst others, setting up an appache and a mysql server, some python packages, ffmpeg. It is not that difficult if you follow the README. (I mentioned that I had some issues with my proxy but this was not related to this software package).

You can try the online demo. The default output is like this:

0 302 113 319 183 0 1 0 0 "person"
0 300 112 318 182 1 1 0 1 "person"
0 298 111 318 182 2 1 0 1 "person"
0 296 110 318 181 3 1 0 1 "person"
0 294 110 318 181 4 1 0 1 "person"
0 292 109 318 180 5 1 0 1 "person"
0 290 108 318 180 6 1 0 1 "person"
0 288 108 318 179 7 1 0 1 "person"
0 286 107 317 179 8 1 0 1 "person"
0 284 106 317 178 9 1 0 1 "person"

Each line contains 10+ columns, separated by spaces. The definition of these columns are:

1   Track ID. All rows with the same ID belong to the same path.
2   xmin. The top left x-coordinate of the bounding box.
3   ymin. The top left y-coordinate of the bounding box.
4   xmax. The bottom right x-coordinate of the bounding box.
5   ymax. The bottom right y-coordinate of the bounding box.
6   frame. The frame that this annotation represents.
7   lost. If 1, the annotation is outside of the view screen.
8   occluded. If 1, the annotation is occluded.
9   generated. If 1, the annotation was automatically interpolated.
10  label. The label for this annotation, enclosed in quotation marks.
11+ attributes. Each column after this is an attribute.

But can also provide output in xml, json, pickle, labelme and pascal voc

So, all in all, this does quite what I wanted and it is also rather easy to use. I am still interested in other options though!

Upvotes: 5

killogre

Reputation: 1800

This is a problem that all practitioners of computer vision face. If you're serious about it, there's a company that does it for you by crowd-sourcing. I don't know whether I should put a link to it in this site, though.

Upvotes: 2

Ground-truth data collection and evaluation for computer vision

Answers (4)

Related Questions