Reputation: 6465
I have an RDD of type RDD[(Int,Double)] in which the first element of the pair is the index and the second is the value and I'd like to convert this RDD to a Vector to use for classification. Could someone help me with that?
I have the following code but it's not working
def vectorize(x:RDD[(Int,Double)], size: Int):Vector = {
val vec = Vectors.sparse(size,x)
}
Upvotes: 1
Views: 3536
Reputation: 330073
Since org.apache.spark.mllib.linalg.Vector
is a local data structure you have to collect your data.
def vectorize(x:RDD[(Int,Double)], size: Int):Vector = {
Vectors.sparse(size, x.collect)
}
Since there is no data distribution you have to be sure output will fit in a driver memory.
In general this operation is not particularly useful. If your data can be easily handled using local data structures then it probably shouldn't be stored inside RDD in the first place.
Upvotes: 2