Reputation: 1298
According to Spark ML docs random forest and gradient-boosted trees can be used for both: classification and regression problems:
Suppose my "label" is taking integer values from 0..n and I want to train these classifiers for regression problem, predicting continuous variable value for the label field. However, I don't see in the documentation how both of these regressors should be configured for this problem and I don't see any class parameters which distinguish cases for regression vs classification. How both classifiers should be configured for regression problems, then?
Upvotes: 0
Views: 1265
Reputation: 60370
There is no such configuration involved, simply because the regression & classification problems are actually handled by different submodules & classes in Spark ML; i.e. for classification, you should use (assuming PySpark):
from pyspark.ml.classification import GBTClassifier # GBT
from pyspark.ml.classification import RandomForestClassifier # RF
while for regression you should use respectively
from pyspark.ml.regression import GBTRegressor # GBT
from pyspark.ml.regression import RandomForestRegressor # RF
Check the Classification and regression overview in the docs for more details.
Upvotes: 1