Mohamed Karim Bouaziz
Mohamed Karim Bouaziz

Reputation: 111

Spark Multi Label classification

I am looking to implement with Spark, a multi label classification algorithm with multi output, but I am surprised that there isn’t any model in Spark Machine Learning libraries that can do this.

How can I do this with Spark ?

Otherwise Scikit Learn Logistic Regresssion support multi label classification in input/output , but doesn't support a huge data for training.

to view the code in scikit learn, please click on the following link: https://gist.github.com/mkbouaziz/5bdb463c99ba9da317a1495d4635d0fc

Upvotes: 11

Views: 6816

Answers (1)

marilena.oita
marilena.oita

Reputation: 994

Also in Spark there is Logistic Regression that supports multilabel classification based on the api documentation. See also this.

The problem that you have on scikitlearn for the huge amount of training data will disappear with spark, using an appropriate Spark configuration.

Another approach is to use binary classifiers for each of the labels that your problem has, and get multilabel by running relevant-irrelevant predictions for that label. You can easily do that in Spark using any binary classifier.

Indirectly, what might also be of help, is to use multilabel categorization with nearest-neighbors, which is also state-of-the-art. Some nearest neighbors Spark extensions, like Spark KNN or Spark KNN graphs, for instance.

Upvotes: 7

Related Questions