apache spark mllib naive bayes LabeledPoint usage

Question

I want to use spark mllib naive bayes to process (train and test) data like this

Male,Suspicion of Alcohol,Weekday,12am-4am,75,30-39

so that I can test for labels Male / Female / Unknown. I want to create a LabeledPoint so that this data can be run against the mllib naive bayes algorithm. The example on the spark site

https://spark.apache.org/docs/1.0.0/mllib-naive-bayes.html

only shows data that is all numeric. Is it possible to run using string data like this ? I understand that my test label will need to be converted to a double value i.e. Male / Female / Unknown => 1.0 / 2.0 / 3.0

If so, how do I convert the CSV data above to a LabelPoint using this type of syntax ?

val parsedData = data.map { line =>
  val parts = line.split(',')
  LabeledPoint(
    parts(0).toDouble, 
    Vectors.dense(parts(1).split(' ').map(_.toDouble)))
}

apache spark mllib naive bayes LabeledPoint usage

Answers (1)

Related Questions