Illegal Argument Exception using Random Forest in PySpark mllib

I am using Random Forest algorithm for classification in Spark MLlib using PySpark. My codes are as follows:\

model = RandomForest.trainClassifier(trnData, numClasses=3, categoricalFeaturesInfo={}, numTrees=3, featureSubsetStrategy="auto", impurity='gini', maxDepth=4, maxBins=32)

predictions = model.predict(tst_dataRDD.map(lambda x: x.features))

labelsAndPredictions = tst_dataRDD.map(lambda lp: lp.label).zip(predictions)

testErr = labelsAndPredictions.filter(lambda x: x[0] != x[1]).count() / float(tst_dataRDD.count())

I got IllegalArgumentException: GiniAggregator given label -0.0625but requires label to be non-negative.
How can I solve this problem? Thanks

Upvotes: 0

Answers (2)

Som

Reputation: 6323

It seems for Gini impurity during multiclass classification, the labels must be positive (>=0). Please check if there are any negative labels present.

ref - spark repo

Also, on side note, please use algorithm from ml package and not from legacy mllib

Upvotes: 1

Hossein Torabi

Reputation: 733

Please use RandomForestClassifier instead and see the docs: https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier

Upvotes: 0

Illegal Argument Exception using Random Forest in PySpark mllib

Answers (2)

Related Questions