Reputation: 3213
I am trying to do image classification with tensor flow. Right now I am hand collecting and labeling training data but it is pretty tedious, slow and painful. Currently, with my hand gathered training data, my model is predicting the correct class in an image about 57% of the time, with their being 6 different classes it is obviously doing better than just randomly guessing.
Anyway, I was wondering if my classifier is correct 57% of the time; would it be feasible to use this classifier to label new training data so as to automate the collecting and labeling of training data? Obviously, this training data would not be labelled perfectly; in fact it would only be labelled with around 57% accuracy but would this still work? Will this help the accuracy of the model at all, not affect it, or hurt it? It seems an interesting thought experiment:
if Z is the accuracy of the classifier that is labeling new training data, N is the number of training data examples we have and G is the accuracy of our model when applied to new non training data what is the limit of G as N approaches infinity and how doe sit depend on Z??
Upvotes: 0
Views: 73
Reputation: 20130
Your way should give no benefit, since you would only train all the stuff you already think you'd know (you train correctly all the stuff you could predict correctly before and you train falsely all the stuff you predicted wrong) => you should get nearly the same classifier after the next training if you auto-labeled with your current classifier.
But: Often it is easier to sort/fix a presorted-annotated data than to label everything completely manually. If that's the case in your task, you can use your classifier to pre-sort the data and check+fix manually. Then train to improve your classifier => pre-sorting of new data will be better => less time to check+fix => less time to further improve the classifier, and so on...
nice tool for check+fix is irfanView:
Upvotes: 1