Madzor
Madzor

Reputation: 275

Get range of Dataframe Row

So I've loaded a dataframe from a parquet file. This dataframe now contains an unspecified number of columns. The first column is a Label, and the following are features.

I want to save each row in the dataframe as a LabeledPoint.

So far im thinking:

val labeledPoints: RDD[LabeledPoint] =df.map{row => LabeledPoint(row.getInt(0),Vectors.dense(row.getDouble(1),row.getDouble(2)))}

Its easy to get the column indexes, but when handling a lot of columns this won't hold. I'd like to be able to load the entire row starting from index 1 (since index 0 is the label) into a dense vector.

Any ideas?

Upvotes: 0

Views: 587

Answers (1)

Till Rohrmann
Till Rohrmann

Reputation: 13346

This should do the trick

df.map {
  row: Row => 
    val data = for (index <- 1 until row.length) yield row.getDouble(index)
    val vector = new DenseVector(data.toArray)
    new LabeledPoint(row.getInt(0), vector)
}

Upvotes: 1

Related Questions