RandomForestClassifier has no attribute transform, so how to get predictions?

Question

How do you get predictions out of a RandomForestClassifier? Loosely following the latest docs here, my code looks like...

# Split the data into training and test sets (30% held out for testing)
SPLIT_SEED = 64  # some const seed just for reproducibility
TRAIN_RATIO = 0.75
(trainingData, testData) = df.randomSplit([TRAIN_RATIO, 1-TRAIN_RATIO], seed=SPLIT_SEED)
print(f"Training set ({trainingData.count()}):")
trainingData.show(n=3)
print(f"Test set ({testData.count()}):")
testData.show(n=3)

# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="labels", featuresCol="features", numTrees=36)

rf.fit(trainingData)
#print(rf.featureImportances)

preds = rf.transform(testData)

When running this, I get the error

AttributeError: 'RandomForestClassifier' object has no attribute 'transform'

Examining the python api docs, I see nothing that look like it relates to generating predictions from the trained model (nor feature importance for that matter). Not much experience with mllib, so not sure what to make of this. Anyone with more experience know what to do here?

E.ZY. · Accepted Answer

by looking closely to the documentation

>>> model = rf.fit(td)
>>> model.featureImportances
SparseVector(1, {0: 1.0})
>>> allclose(model.treeWeights, [1.0, 1.0, 1.0])
True
>>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"])
>>> result = model.transform(test0).head()
>>> result.prediction

you will notice the rf.fit return fitted models which is different than the original RandomForestClassifier class.

And the model will have the method to transform and also feature importance

so in your code

# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="labels", featuresCol="features", numTrees=36)

model = rf.fit(trainingData)
#print(rf.featureImportances)

preds = model.transform(testData)

RandomForestClassifier has no attribute transform, so how to get predictions?

Answers (1)

Related Questions