Daniel Amaya
Daniel Amaya

Reputation: 113

When to use supervised or unsupervised learning?

Thanks

Upvotes: 6

Views: 5212

Answers (3)

G V SRI RAJIV JEGAN
G V SRI RAJIV JEGAN

Reputation: 1

Depends on the data set that you have. If you have target feature in your hand then you should go for supervised learning. If you don't have then it is a unsupervised based problem. Supervised is like teaching the model with examples. Unsupervised learning is mainly used to group similar data, it plays a major role in feature engineering. Thank you..

Upvotes: 0

Ivan Sivak
Ivan Sivak

Reputation: 7488

Depends on your needs. If you have a set of existing data including the target values that you wish to predict (labels) then you probably need supervised learning (e.g. is something true or false; or does this data represent a fish or cat or a dog? Simply put - you already have examples of right answers and you are just telling the algorithm what to predict). You also need to distinguish whether you need a classification or regression. Classification is when you need to categorize the predicted values into given classes (e.g. is it likely that this person develops a diabetes - yes or no? In other words - discrete values) and regression is when you need to predict continuous values (1,2, 4.56, 12.99, 23 etc.). There are many supervised learning algorithms to choose from (k-nearest neighbors, naive bayes, SVN, ridge..)

On contrary - use the unsupervised learning if you don't have the labels (or target values). You're simply trying to identify the clusters of data as they come. E.g. k-Means, DBScan, spectral clustering..)

So it depends and there's no exact answer but generally speaking you need to:

  1. Collect and see you data. You need to know your data and only then decide which way you choose or what algorithm will best suite your needs.

  2. Train your algorithm. Be sure to have a clean and good data and bear in mind that in case of unsupervised learning you can skip this step as you don't have the target values. You test your algorithm right away

  3. Test your algorithm. Run and see how well your algorithm behaves. In case of supervised learning you can use some training data to evaluate how well is your algorithm doing.

There are many books online about machine learning and many online lectures on the topic as well.

Upvotes: 1

petezurich
petezurich

Reputation: 10174

  1. If you a have labeled dataset you can use both. If you have no labels you only can use unsupervised learning.

  2. It´s not a question of "better". It´s a question of what you want to achieve. E.g. clustering data is usually unsupervised – you want the algorithm to tell you how your data is structured. Categorizing is supervised since you need to teach your algorithm what is what in order to make predictions on unseen data.

  3. See 1.

On a side note: These are very broad questions. I suggest you familiarize yourself with some ML foundations.

Good podcast for example here: http://ocdevel.com/podcasts/machine-learning

Very good book / notebooks by Jake VanderPlas: http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb

Upvotes: 6

Related Questions