Reputation: 3
Let's say we have a variable var1 of type
org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
and another variable var2 of type
org.apache.spark.rdd.RDD[Int]
both of them have same number of rows.
what I want is add var2 as new column to var1.
Upvotes: 0
Views: 322
Reputation: 3055
The easiest way to achieve your goal is to do this:
vv.zip(ii).map( t => Vectors.dense(t._1.toArray ++ Array(t._2.toDouble) ) )
where vv
is you RDD[Vector]
and ii
is your RDD[Int]
. Maybe it's not the most efficient way, but it's the easiest one.
Upvotes: 1