Reputation: 7404
I am using python to do a bit of machine learning.
I have a python nd array with 2000 entries. Each entry has information about some subjects and at the end has a boolean to tell me if they are a vampire or not.
Each entry in the array looks like this:
[height(cm), weight(kg), stake aversion, garlic aversion, reflectance, shiny, IS_VAMPIRE?]
My goal is to be able to give a probability that a new subject is a vampire given the data shown above for the subject.
I have used sklearn to do some machine learning for me:
clf = tree.DecisionTreeRegressor()
clf=clf.fit(X,Y)
print clf.predict(W)
Where W is an array of data for the new subject. The script I have written returns booleans, but I would like it to return probabilities. How can I modify it?
Upvotes: 5
Views: 394
Reputation: 349
You're using a regressor but you probably want to use a classifier.
You'll also want to use a classifier that can give you posterior probabilities like a decision tree or logistic regression. Other classifiers may give you a score (some kind of confidence measure) which may also work for your needs.
Upvotes: 0
Reputation: 2368
If you are using DecisionTreeRegressor() then you may use the score function to determine the coefficient of determination R^2 of the prediction.
Please find the below link to the documentation.
Also you can list out the cross validation score (for 10 samples) as below
from sklearn.model_selection import cross_val_score
clf = tree.DecisionTreeRegressor()
clf=clf.fit(X,Y)
cross_val_score(clf, X, Y, cv=10)
print clf.predict(W)
Which gives an output something similar to this,
array([ 0.61..., 0.57..., -0.34..., 0.41..., 0.75...,
0.07..., 0.29..., 0.33..., -1.42..., -1.77...])
Upvotes: 3
Reputation: 1
You want to use a classifier that gives you a probability. Also, you will want to make sure in your testing array W, the data points are not replicates of any of your training data. If it matches exactly with any of your training data, it thinks it's definitely vampire or definitely not vampire, so will give you 0 or 1.
Upvotes: 0
Reputation: 251408
Use a DecisionTreeClassifier instead of a regressor, and use the predict_proba
method. Alternatively, you could use a logistic regression (also available in scikit learn.)
The basic idea is this:
clf = tree.DecisionTreeClassifier()
clf=clf.fit(X,Y)
print clf.predict_proba(W)
Upvotes: 2