Seiji Armstrong
Seiji Armstrong

Reputation: 1185

How to convert scala vector to spark ML vector?

I have a vector of type scala.collection.immutable.Vector and would like to convert it to a vector of type org.apache.spark.ml.linalg.Vector.

For example, I want something like the following;

import org.apache.spark.ml.linalg.Vectors
val scalaVec = Vector(1,2,3)
val sparkVec = Vectors.dense(scalaVec)

Note that I could simply type val sparkVec = Vectors.dense(1,2,3) but I want to convert existing scala collection Vectors. I want to do this to embed these DenseVectors in a DataFrame to feed into spark.ml pipelines.

Upvotes: 2

Views: 6191

Answers (2)

pwb2103
pwb2103

Reputation: 540

Vectors.dense can take an array of doubles. Likely what is causing your trouble is that Vectors.dense won't accept Ints which you are using in scalaVec in your example. So the following fails:

val test = Seq(1,2,3,4,5).to[scala.Vector].toArray
Vectors.dense(test)

import org.apache.spark.ml.linalg.Vectors
test: Array[Int] = Array(1, 2, 3, 4, 5)
<console>:67: error: overloaded method value dense with alternatives:
  (values: Array[Double])org.apache.spark.ml.linalg.Vector <and>
  (firstValue: Double,otherValues: Double*)org.apache.spark.ml.linalg.Vector cannot be applied to (Array[Int])
   Vectors.dense(test)

While this works:

val testDouble = Seq(1,2,3,4,5).map(x=>x.toDouble).to[scala.Vector].toArray
Vectors.dense(testDouble)

testDouble: Array[Double] = Array(1.0, 2.0, 3.0, 4.0, 5.0)
res11: org.apache.spark.ml.linalg.Vector = [1.0,2.0,3.0,4.0,5.0]

Upvotes: 3

Kuladip
Kuladip

Reputation: 1

You can pass vector element as var-args as follows :

val scalaVec = Vector(1, 2, 3) 
val sparkVec = Vectors.dense(scalaVec:_*)

Upvotes: 0

Related Questions