Reputation: 135
I know this question has been asked before but I haven't been able to find an answer I can work with yet. I'm new to Python and Tensorflow but managed to get my accuracy up to +-99.3% with the MNIST-image set. Now I would like to try to use my own images but this proved to be more difficult to me than expected.
I have read the tutorial page on the Tensorflow site hundreds of times but it just doesn't make sense to me and whatever I try I just end up with warnings. Now I want to figure it out myself but does anyone have an idea which way would be the easiest to work with my own images? Or any examples? I've been looking online for them but it feels like I'm finding 1000's of them but none of them gets explained in a way I can understand.
Thanks for your help in advance.
Upvotes: 4
Views: 1580
Reputation: 21917
OK, so putting this together, you have 42 classes, and expect to have approximately 10 pictures each.
This places you pretty squarely in the need of two things:
You've already addressed the likely need for data augmentation in the comments, and you're spot on: In order to make the most of your 10 images per class, you'll want to apply a whole bunch of transformations to them. Probably many more than 10/20 total images:
A good example of data augmentation for image classification is in the official resnet example model
The second is transfer learning. When you're trying to learn a model on 42 classes from quite little data, you'll probably be able to do better by starting from a model trained on other data, and then retraining the last (few) layers with your new dataset. The reasoning behind this is that the much larger example space of the initial training will help your classifier learn a variety of common image features, which your transfer-learned classifier can use to achieve higher-level recognition more quickly.
An alternative, of course, is you could do some form of active learning -- train a classifier, and then show it images of your tokens (perhaps via webcam, classifying each frame), and when it gets one wrong, save that as an example for the next training round. This takes more work and you'd have to build some infrastructure for it, so I'd start with transfer learning.
You then have the question of what architecture to start with for the transfer learning. Inception is probably too much for what you're doing, but the stock MNIST model is probably more simple than you want. You'll need to do some experimentation -- a modified LeNet-style classifier like the common MNIST examples can work pretty well (add another convolutional layer, add batchnorm and maybe a bit of dropout). Alternatively, you could start with a pretrained Mobilenet network and transfer learn from it. I'd be tempted to start with the latter, unless you've got some strict computational limits to inference speed.
For your images, I'd start by creating a directory of JPEG images. The most "official" way to handle them would be the process in this answer.
Upvotes: 3