Odisseo
Odisseo

Reputation: 777

Stacking ML Algorithms in Spark

Is there a spark api to build stacking ensembles in spark or should one build them from scratch? I haven’t found any resources online about this topic

Upvotes: 2

Views: 1937

Answers (1)

Pierre Nodet
Pierre Nodet

Reputation: 66

As said in the comment of AKSW, in the current Apache Spark MLlib there is only two specific implementations of Ensemble Models which are Random Forests for Bagging and Gradient Boosted Trees for Boosting.

For the stacking part I don't think there is something you can find on MLlib, you have to do it by your own by either :

  1. Create a function to generate a Pipeline that would do stacking by adding your base learners with vector assemblers and the final stacking algorithm
  2. Create a Meta Estimator that would take your base learners and your stacking algorithm as parameters

The second one is convenient because it can work with all the MLlib tools as Tuning Tools

For the second solution I have made a library that contains a Boosting, Bagging and Stacking Meta-Estimators : spark-ensemble

You can take some implementations ideas out of that !

Upvotes: 5

Related Questions