Reputation: 71
I have a multiclass classification problem I'm looking to sort with logistic regression. I know this can also be tackled by decision trees and random forest, but wish to stick specifically with "LogisticRegressionWithLBFGS". I have all the data tidying done. I have my data nice and tidy in a dataframe with a: label field (String), a feature vector (vector of features/ numbers) and a third column "LabelIndex" (numbers representing the class).
When I do a train test split on the data frame and try to fit it to: LogisticRegressionWithLBFGS
val model = new LogisticRegressionWithLBFGS().setNumClasses(10).setIntercept(true).setValidateData(true).run("trainingData")
It doesn't like the "run" part.
The example I am working off, loads a data file in via:
val data = MLUtils.loadLibSVMFile(Spark.sparkContext, "data/mnist.bz2")
(i'm trying to copy the example, and slot in my own data. But its in a different format, looks different etc) I was doing a bit of reading, and I'd come across, I need to convert my dataframe to a RDD[LabeledPoint]. I need to map it.
I'm having problems finding good info on how to do this.
How do I simply convert a Dataframe with 3 fields as described above, "Label" (String), "Features" (feature vector), "IndexedLabel" (Double) into a RDD[LabeledPoint]?
Upvotes: 0
Views: 133
Reputation: 71
Got it working:
Can't convert Dataframe to Labeled Point
This link showed me how to make the conversion successfully.
Upvotes: 0