sqrt2sqrt2
sqrt2sqrt2

Reputation: 15

Multi classification with Random Forest - how to measure the "stability" of results

I am using Random Forest (from sklearn) for a multi-classification problem, with ordered classes(say 0,...,n, with n=4 in my specific case) roughly equally distributed. I have many observations (roughly 5000) and I split them in train/test 70%/30% respectively - the classes are equally distributed also in train and test. I set random_state=None, so each time I re-run the fitting of the model (on the same training set) and then the prediction, I obtain slightly different results on my test set.

My question is how to measure if Random Forest is working well by comparing different predictions...

For example if I obtain as predictions first only 0 and then only n (where, as said, 0 and n are the most different classes), I would say that the RF is not working at all. On the contrary if only few predictions change from a class to a close one (e.g. first 0 and then 1), I would say RF is working well.

Is there a specific command to check this automatically?

Upvotes: 0

Views: 445

Answers (2)

Alex Serra Marrugat
Alex Serra Marrugat

Reputation: 2042

One solution is use cross validation. With this you will obtain a robust measure of general accuracy of the model.

Then you will train and test n different models (check this link, it is pretty well explained). You can calculate the accuracy of each model, and then obtain the mean of these measures. And example would be (with 5 splits):

scores = cross_val_score(clf, X, y, cv=5)

And then plot the mean and std deviation of all of these accuracies:

print("%0.2f accuracy with a standard deviation of %0.2f" % (scores.mean(), scores.std()))

Upvotes: 0

mss
mss

Reputation: 368

I think for this type of investigation we do not care whether the classifier made the right prediction, but we want to know whether it made stable==consistent predictions.

Assume repeated_prediction has shape: [repetitions,samples] and contains the predictions for each sample 1...n with multiple repetitions

What about:

np.mean(np.std(repeated_predictions,axis=0))

There are also papers that analyze the consistency of Random Forest's e.g. Consistency of Random Forests and Other Averaging Classifiers but it seems to be a though read.

Upvotes: 0

Related Questions