JetS79
JetS79

Reputation: 71

Scala Multiclass classification with labeled point

I have a multiclass classification problem I'm looking to sort with logistic regression. I know this can also be tackled by decision trees and random forest, but wish to stick specifically with "LogisticRegressionWithLBFGS". I have all the data tidying done. I have my data nice and tidy in a dataframe with a: label field (String), a feature vector (vector of features/ numbers) and a third column "LabelIndex" (numbers representing the class).

When I do a train test split on the data frame and try to fit it to: LogisticRegressionWithLBFGS

val model = new LogisticRegressionWithLBFGS().setNumClasses(10).setIntercept(true).setValidateData(true).run("trainingData")

It doesn't like the "run" part.

The example I am working off, loads a data file in via:

val data = MLUtils.loadLibSVMFile(Spark.sparkContext, "data/mnist.bz2")

(i'm trying to copy the example, and slot in my own data. But its in a different format, looks different etc) I was doing a bit of reading, and I'd come across, I need to convert my dataframe to a RDD[LabeledPoint]. I need to map it.

I'm having problems finding good info on how to do this.

How do I simply convert a Dataframe with 3 fields as described above, "Label" (String), "Features" (feature vector), "IndexedLabel" (Double) into a RDD[LabeledPoint]?

Upvotes: 0

Views: 133

Answers (1)

JetS79
JetS79

Reputation: 71

Got it working:

Can't convert Dataframe to Labeled Point

This link showed me how to make the conversion successfully.

Upvotes: 0

Related Questions