Sourav Mohanty
Sourav Mohanty

Reputation: 3

How to add a column of type RDD[Int] to RDD[Vector]

Let's say we have a variable var1 of type org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] and another variable var2 of type org.apache.spark.rdd.RDD[Int] both of them have same number of rows.

what I want is add var2 as new column to var1.

Upvotes: 0

Views: 322

Answers (1)

mgaido
mgaido

Reputation: 3055

The easiest way to achieve your goal is to do this:

vv.zip(ii).map( t => Vectors.dense(t._1.toArray ++ Array(t._2.toDouble) ) )

where vv is you RDD[Vector] and ii is your RDD[Int]. Maybe it's not the most efficient way, but it's the easiest one.

Upvotes: 1

Related Questions