cody123
cody123

Reputation: 2080

I am running GBT in Spark ML for CTR prediction. I am getting exception because of MaxBin Parameter

Exception details :

  • Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: DecisionTree requires maxBins (= 32) to be at least as large as the number of values in each categorical feature, but categorical feature 4139 has 16094 values. Considering remove this and other categorical features with a large number of values, or add more training examples. at scala.Predef$.require(Predef.scala:233) at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:133) at org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:137) at org.apache.spark.mllib.tree.DecisionTree.run(DecisionTree.scala:60) at org.apache.spark.mllib.tree.GradientBoostedTrees$.org$apache$spark$mllib$tree$GradientBoostedTrees$$boost(GradientBoostedTrees.scala:208)
GBTClassifier gbt = new GBTClassifier().setLabelCol("indexedclick").setFeaturesCol("features_index").setMaxIter(20).**setMaxBins(16094)**.setMaxDepth(30).setMinInfoGain(0.0001).setStepSize(0.00001).setSeed(200).setLossType("logistic").setSubsamplingRate(0.2);

I want to know what should be the correct max bin size because If even I am setting large value of MaxBin also causing the same exception.

Your small help will be highly appreciated.

Upvotes: 0

Views: 1005

Answers (1)

BJC
BJC

Reputation: 501

Can you set the max bins to 1 more than the number of entries in the largest categorical feature, in this case set it to 16095. i.e setMaxBins(16095)

Upvotes: 0

Related Questions