Reputation: 107
I am a beginner in machine learning and recently read about supervised and unsupervised machine learning. It looks like supervised learning is synonymous to classification and unsupervised learning is synonymous to clustering, is it so?
Upvotes: 0
Views: 2743
Reputation: 77454
Supervised learning is when you have labeled training data. In other words, you have a well-defined target to optimize your method for.
Typical (supervised) learning tasks are classification and regression: learning to predict categorial (classification), numerical (regression) values or ranks (learning to rank).
Unsupservised learning is an odd term. Because most of the time, the methods aren't "learning" anything. Because what would they learn from? You don't have training data?
There are plenty of unsupervised methods that don't fit the "learning" paradigm well. This includes dimensionality reduction methods such as PCA (which by far predates any "machine learning" - PCA was proposed in 1901, long before the computer!). Many of these are just data-driven statistics (as opposed to parameterized statistics). This includes most cluster analysis methods, outlier detection, ... for understanding these, it's better to step out of the "learning" mindset. Many people have trouble understanding these approaches, because they always think in the "minimize objective function f" mindset common in learning.
Consider for example DBSCAN. One of the most popular clustering algorithms. It does not fit the learning paradigm well. It can nicely be interpreted as a graph-theoretic construct: (density-) connected components. But it doesn't optimize any objective function. It computes the transitive closure of a relation; but there is no function maximized or minimized.
Similarly APRIORI finds frequent itemsets; combinations of items that occur more than minsupp times, where minsupp is a user parameter. It's an extremely simple definition; but the search space can be painfully large when you have large data. The brute-force approach just doesn't finish in acceptable time. So APRIORI uses a clever search strategy to avoid unnecessary hard disk accesses, computations, and memory. But there is no "worse" or "better" result as in learning. Either the result is correct (complete) or not - nothing to optimize on the result (only on the algorithm runtime).
Calling these methods "unsupervised learning" is squeezing them into a mindset that they don't belong into. They are not "learning" anything. Neither optimizes a function, or uses labels, or uses any kind of feedback. They just SELECT a certain set of objects from the database: APRIORI selects columns that frequently have a 1 at the same time; DBSCAN select connected components in a density graph. Either the result is correct, or not.
Some (but by far not all) unsupervised methods can be formalized as an optimization problem. At which point they become similar to popular supervised learning approaches. For example k-means is a minimization problem. PCA is a minimization problem, too - closely related to linear regression, actually. But it is the other way around. Many machine learning tasks are transformed into an optimization problem; and can be solved with general purpose statistical tools, which just happen to be highly popular in machine learning (e.g. linear programming). All the "learning" part is then wrapped into the way the data is transformed prior to feeding it into the optimizer. And in some cases, like for PCA, a non-iterative way to compute the optimum solution was found (in 1901). So in these cases, you don't need the usual optimization hammer at all.
Upvotes: 1
Reputation: 6069
No.
Supervised learning is when you know correct answers (targets). Depending on their type, it might be classification (categorical targets), regression (numerical targets) or learning to rank (ordinal targets) (this list is by no means complete, there might be other types that I either forgot or unaware of).
On the contrary, in unsupervised learning setting we don't know correct answers, and we try to infer, learn some structure from data. Be it cluster number or low-dimensional approximation (dimensionality reduction, actually, one might think of clusterization as of extreme 1D case of dimensionality reduction). Again, this might be far away from completeness, but the general idea is about hidden structure, that we try to discover from data.
Upvotes: 3