palau1
palau1

Reputation: 21

Machine learning - training step

When you're using Haar-like features for your training data for an Adaboost algorithm, how do you build your data sets? Do you literally have to find thousands of positive and negative samples? There must be a more efficient way of doing this...

I'm trying to analyze images in matlab (not faces) and am relatively new to image processing.

Upvotes: 2

Views: 564

Answers (4)

efn
efn

Reputation: 55

The only reason to have kind of equal positive and negative samples is to avoid bias. Sometimes you might get high accuracy , but it completely fails to classify one category. To evaluate such methods precision/recall are more useful than accuracy.

Upvotes: 0

Aadeshnpn
Aadeshnpn

Reputation: 31

Yes we need many positive and negative samples for the training but the collection of those data is very tedious. But you can make it easy by taking videos instead of pictures and using ffmpeg to convert those videos into pictures. It will make the training part much easier.

Upvotes: 0

William Fang
William Fang

Reputation: 46

Undoubtedly, more data, more information, better result. You should include more information as possible. However, one thing you may need care is the ratio of positive set to negative set. For logistic regression, the ratio should not be over 1:5, for adaboost, I'm not really sure with the result, but it will certainly change with the ratio (I tried before).

Upvotes: 0

Dima
Dima

Reputation: 39389

Yes, you do need many positive and negative samples for training. This is especially true for Adaboost, which works by repeatedly resampling the training set. How many samples is enough is hard to say. But generally, the more the better, because that increases the chances of your training set being representative.

Also, it seems to me that your quest for efficiency is misplaced. Training is done ahead of time, presumably off-line. It is the efficiency of classifying unknown instances after the training is done, that people usually worry about.

Upvotes: 4

Related Questions