Reputation: 4139
How do you get predictions out of a RandomForestClassifier? Loosely following the latest docs here, my code looks like...
# Split the data into training and test sets (30% held out for testing)
SPLIT_SEED = 64 # some const seed just for reproducibility
TRAIN_RATIO = 0.75
(trainingData, testData) = df.randomSplit([TRAIN_RATIO, 1-TRAIN_RATIO], seed=SPLIT_SEED)
print(f"Training set ({trainingData.count()}):")
trainingData.show(n=3)
print(f"Test set ({testData.count()}):")
testData.show(n=3)
# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="labels", featuresCol="features", numTrees=36)
rf.fit(trainingData)
#print(rf.featureImportances)
preds = rf.transform(testData)
When running this, I get the error
AttributeError: 'RandomForestClassifier' object has no attribute 'transform'
Examining the python api docs, I see nothing that look like it relates to generating predictions from the trained model (nor feature importance for that matter). Not much experience with mllib, so not sure what to make of this. Anyone with more experience know what to do here?
Upvotes: 0
Views: 3057
Reputation: 725
by looking closely to the documentation
>>> model = rf.fit(td)
>>> model.featureImportances
SparseVector(1, {0: 1.0})
>>> allclose(model.treeWeights, [1.0, 1.0, 1.0])
True
>>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"])
>>> result = model.transform(test0).head()
>>> result.prediction
you will notice the rf.fit return fitted models which is different than the original RandomForestClassifier class.
And the model will have the method to transform and also feature importance
so in your code
# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="labels", featuresCol="features", numTrees=36)
model = rf.fit(trainingData)
#print(rf.featureImportances)
preds = model.transform(testData)
Upvotes: 2