hitechnet
hitechnet

Reputation: 45

Creating supervised model in machine learning

I have recently learned how supervised learning works. It learns labeled dataset and predict unlabeled datum.

But, I have a question that is it fine to teach the created model with the predicted datum and then predict unlabeled datum again. And repeat the process.

For example, Model M was created by 10 labeled dataset D, then Model M predicts datum A. Then, data A is added into dataset D and creates Model M again. The process is repeated with the amount of unpredicted data.

Upvotes: 3

Views: 218

Answers (2)

lejlot
lejlot

Reputation: 66825

What you are describing here is a well known technique known as (among other names) "selftraining" or "self semi-supervised training". See for example slides https://www.cs.utah.edu/~piyush/teaching/8-11-print.pdf. There are hundreads of modifications around this idea. Unfortunately, in general it is hard to prove that it should help, so while it will help for some datasets it will hard the other ones. The main criterion here is the quality of the very first model, since selftraining is based on the assumption, that your original model is really good, thus you can trust it enough to label new examples. It might help with slow concept drift with a strong model, but will fail misserably with weak models.

Upvotes: 2

Atilla Ozgur
Atilla Ozgur

Reputation: 14721

What you describe is called online machine learning, incremental supervised learning, Updateable Classifiers... There are bunch of algorithms that accomplish these behavior. See for example weka toolbox Updateable Classifiers. I suggest to look following ones.

  • HoeffdingTree
  • IBk
  • NaiveBayesUpdateable
  • SGD

Upvotes: -1

Related Questions