Reputation: 649
I am using this dataset: http://www.robots.ox.ac.uk/~vgg/data/hands/
However, I am only going to use hands which are > 4200 sq pixels in area, which leaves me with just 621 hands. I realised that this may not be enough & I will look for more images if need be. However, all the images are annotated as follows:
I have the coordinates of the bounding box as follows:
However, the bounding box is NOT aligned with the x & y axis.
I believe I have to crop the hands out of the image. I have 2 ways of doing this:
1) let xmin and xmax be minimum and maximum x coordinates of the bounding box. While ymin & ymax are the minimum & maximum y coordinates of the bounding box. If do this, I'll get this:
(without the bounding box, of course)
basically, some part of the background remains in the above.
2) I can use a binary mask to "mask' out all the pixels INSIDE the bounding box. If I do this, the size of my image will still be from xmin to xmax & ymin to ymax, however I can set the rest of the background to be white.
What would be better? I believe the positives are supposed to only contain the object of interest (hands in this case), so leaving the background may not be right? However, would a white background be ok?
The main problem here is that the bounding box is not aligned!
Upvotes: 1
Views: 1170
Reputation: 163
You might also want to have a look at labelme http://labelme.csail.mit.edu/Release3.0/index.php
I used that for my project, and they also show you how to use amazon turk to get your own datasets. I think you might also be able to use their datasets.
Upvotes: 0
Reputation: 6666
Leaving a small amount of background in is ok, it will be different in each image and therefore not be created as part of the classifier.
I would suggest making a bigger negative set to cancel out the background and 620 images will be fine.
Upvotes: 0