Reputation: 13
In Udacity's Intro to Machine Learning class, I am finding that the result of my code can change each time I run it. The correct values are acc_min_samples_split_2 = .908 and acc_min_samples_split_2 = .912, but when I run my script, sometimes the value for acc_min_samples_split_2 = .912 as well. This happens on both my local machine and the web interface within Udacity. Why might this be happening?
The program uses the SciKit Learn library for python. Here is the part of the code that I wrote:
def classify(features, labels, samples):
# Creates a new Decision Tree Classifier, and fits it based on sample data
# and a specified min_sample_split value
from sklearn import tree
clf = tree.DecisionTreeClassifier(min_samples_split = samples)
clf = clf.fit(features, labels)
return clf
#Create a classifier with a min sample split of 2, and test its accuracy
clf2 = classify(features_train, labels_train, 2)
acc_min_samples_split_2 = clf2.score(features_test,labels_test)
#Create a classifier with a min sample split of 50, and test its accuracy
clf50 = classify(features_train, labels_train, 50)
acc_min_samples_split_50 = clf50.score(features_test,labels_test)
def submitAccuracies():
return {"acc_min_samples_split_2":round(acc_min_samples_split_2,3),
"acc_min_samples_split_50":round(acc_min_samples_split_50,3)}
print submitAccuracies()
Upvotes: 1
Views: 1275
Reputation: 33542
Some classifiers within scikit-learn are of stochastic nature using some PRNG to generate random-numbers internally.
DecisionTree is one of them. Check the docs and use the argument random_state
to make that random-behaviour deterministic.
Just create your fit-object like:
clf = tree.DecisionTreeClassifier(min_samples_split = samples, random_state=0) # or any other constant
If you don't provide a random_state
or some seed/integer like in my example above, the PRNG will be seeded by some external source (most probably based on system-time) resulting in different results across runs of that script.*
Two runs, sharing the code and given constant will behave equal (ignoring some pathological architecture/platform stuff).
Upvotes: 2