Jimmyn
Jimmyn

Reputation: 541

Machine Learning Naive Bayes Classifier in Python

I've been experimenting with machine learning and need to develop a model which will make a prediction based on a number of variables. The easiest way I can explain this is through the "play golf" example below:

train.csv

Outlook,Temperature,Humidity,Windy,Play
overcast,hot,high,FALSE,yes
overcast,cool,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
rainy,mild,normal,FALSE,yes
rainy,mild,high,TRUE,no
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
sunny,mild,normal,TRUE,yes

The program will need to insert the prediction into the makeprediciton.csv file

Outlook,Temperature,Humidity,Windy,Play
rainy,hot,normal,TRUE,

I've been able to apply this classifier using excel. Wondering if there is an easy library in python which can help me group the frequencies and do the calculations rather than having to manually write code for everything.

You can see my approach through excel in the below link: http://www.filedropper.com/playgolf

Any help would be greatly appreciated.

Upvotes: 1

Views: 677

Answers (1)

Masoud
Masoud

Reputation: 1351

It depends. If you don't want to code, Try Rapidminier. It is very simple to learn and experiment. It's documentation is very good and clear.You can see This example for Naive Bayesian classifier and get a result.


Also if you want some coding and use python lang, Try Scikit-learn witch is more advanced lib in python. It utilize scipy and numpy and has very powerful implementation of data mining algorithms. For your example you must first use One-Hot-Encoding to change your categorical feature to high dimension sparse vector and then use a classifier like Naive Bayesian


Also for reading CSV file, you can use Pandas

Upvotes: 1

Related Questions