WeaselFox
WeaselFox

Reputation: 7400

using RandomForestClassifier.predict_proba vs RandomForestRegressor.predict

I have a data set comprising a vector of features, and a target - either 1.0 or 0.0 (representing two classes). If I fit a RandomForestRegressor and call its predict function, is it equivalent to using RandomForestClassifier.predict_proba()?

In other words if the target is 1.0 or 0.0 does RandomForestRegressor output probabilities?

I think so, and the results I a m getting suggest so, but I would like to get a second opinion...

Thanks Weasel

Upvotes: 4

Views: 5541

Answers (1)

alko
alko

Reputation: 48357

There is a major conceptual diffrence between those, based on different tasks being addressed:

Regression: continuous (real-valued) target variable.

Classification: discrete target variable (classes).

For a general classification method, term probability of observation being class X may be not defined, as some classification methods, knn for example, do not deal with probabilities.

However for Random Forest (and some other classification methods), classification is reduced to regression of classes probabilities destibution. Predicted class is taked then as argmax of computed "probabilities". In your case, you feed the same input, you get the same result. And yes, it is ok to treat values returned by RandomForestRegressor as probabilities.

Upvotes: 4

Related Questions