Reputation: 275
So I've loaded a dataframe from a parquet file. This dataframe now contains an unspecified number of columns. The first column is a Label, and the following are features.
I want to save each row in the dataframe as a LabeledPoint.
So far im thinking:
val labeledPoints: RDD[LabeledPoint] =df.map{row => LabeledPoint(row.getInt(0),Vectors.dense(row.getDouble(1),row.getDouble(2)))}
Its easy to get the column indexes, but when handling a lot of columns this won't hold. I'd like to be able to load the entire row starting from index 1 (since index 0 is the label) into a dense vector.
Any ideas?
Upvotes: 0
Views: 587
Reputation: 13346
This should do the trick
df.map {
row: Row =>
val data = for (index <- 1 until row.length) yield row.getDouble(index)
val vector = new DenseVector(data.toArray)
new LabeledPoint(row.getInt(0), vector)
}
Upvotes: 1