xiaohan2012
xiaohan2012

Reputation: 10342

Statistics, machine learning and data mining

I am currently learning data mining and I have the following questions.

  1. what is the relationship between machine learning and data mining?
  2. I found many data mining techniques are associated with statistics, while I "hear" data mining has many thing to do with machine learning. So my question is: is machine learning closely related with statistics?
  3. If they are not closely related, is there such divisions that separate data mining focusing on statistical techniques and data mining focusing on machine learning skills? Because I found department of statistics of some graduate schools open data mining courses.

Upvotes: 1

Views: 2273

Answers (4)

Gorani Rasani Kurdi
Gorani Rasani Kurdi

Reputation: 11

Although overlap between data Data mining and Machine Learning, we can distinguish between them; simply, such as: Data mining search for patterns to predict and/or describe huge data, Machine Learning goes further to use these patterns to learn. And both based on Statistics.

Upvotes: 1

Azim
Azim

Reputation: 1724

A comprehensive answer was already given by @SpeedBirdNine. As a side note:

  • Data-mining and Machine-learning are mainly based on the old but ingenious ideas of statisticians. (Inferential statistics, decision theories, etc.)
  • Classic Statistics + today's powerful computers = DM & ML
  • Since we are living in the era of big data, the barrier statisticians used to be faced with, in terms of the absence of enough data, is no longer an issue. Therefore, in many cases (but not all of course), it is safe to say that Data-mining/Machine-learning is the new Statistics! (The infinity symbol ∞ they used to have in their equations that if n (the sample size) goes to infinity, then everything's behavior is predictable (!), is not a compromised reality anymore!).

Regarding your last question, in my opinion, in any meaningful research, you either need to apply some statistical methods on big data and this is when DM/ML comes in handy, or you need to apply a DM/ML method which is already designed based on classical statistics. These are the two sections that every DM/ML research is involved, and statistics is not excluded, let alone when the goal is to come up with a noble DM/ML algorithm to analyze/cluster/classify big data.

Upvotes: 0

SpeedBirdNine
SpeedBirdNine

Reputation: 4676

Data mining is the process of extracting useful information from data, such as patterns, trends, customer/user behavior, liking/disliking etc. This involves the use of algorithms that are related to Artificial Intelligence and statistics.

Wikipedia's definition of Data Mining is:

Data Mining (the analysis step of the Knowledge Discovery in Databases process,[1] or KDD), a relatively young and interdisciplinary field of computer science,[2][3] is the process of discovering new patterns from large data sets involving methods from statistics and artificial intelligence but also database management. In contrast to for example machine learning, the emphasis lies on the discovery of previously unknown patterns as opposed to generalizing known patterns to new data.

Machine Learning involves making the computers "learn" that behavior, trend etc, and to act according. For example, in credit card fraud, the computer "learns" the behavior of a customer, and if something strange occurs (a transaction involving very high amounts etc), it flags that transaction for potential fraud.

Wikipedia's definition of machine learning is:

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. Machine Learning is concerned with the development of algorithms allowing the machine to learn via inductive inference based on observing data that represents incomplete information about statistical phenomenon. Classification which is also referred to as pattern recognition, is an important task in Machine Learning, by which machines “learn” to automatically recognize complex patterns, to distinguish between exemplars based on their different patterns, and to make intelligent decisions.

Machine learning uses Data Mining to learn the pattern, behavior, trend etc, because Data Mining is the way of extracting this information from a set of data. Data Mining and Machine Learning both use Statistics make decisions. So yes statistics is involved and is very important in Data Mining and Machine learning.

Upvotes: 4

NPE
NPE

Reputation: 500923

There tends to be a lot of overlap between what different people call machine learning, data mining and statistics. The very definitions of the terms would depend on whom you ask.

Here is a nice overview, with lots of great links.

Upvotes: 3

Related Questions