Reputation: 1345
I have a data frame doubleSeq whose structure is as below
res274: org.apache.spark.sql.DataFrame = [finalFeatures: vector]
The first record of the column is as follows
res281: org.apache.spark.sql.Row = [[3.0,6.0,-0.7876947819954485,-0.21757635218517163,0.9731844373162398,-0.6641741696340383,-0.6860072219935377,-0.2990737363481845,-0.7075863760365155,0.8188108975549018,-0.8468559840943759,-0.04349947247406488,-0.45236764452589984,1.0333959313820456,0.6097566070878347,-0.7106619551471779,-0.7750330808435969,-0.08097610412658443,-0.45338437108038904,-0.2952869863393396,-0.30959772365257004,0.6988768123463287,0.17049117199049213,3.2674649019757385,-0.8333373234944124,1.8462942520757128,-0.49441222531240125,-0.44187299748074166,-0.300810826687287]]
I want to extract the double array
[3.0,6.0,-0.7876947819954485,-0.21757635218517163,0.9731844373162398,-0.6641741696340383,-0.6860072219935377,-0.2990737363481845,-0.7075863760365155,0.8188108975549018,-0.8468559840943759,-0.04349947247406488,-0.45236764452589984,1.0333959313820456,0.6097566070878347,-0.7106619551471779,-0.7750330808435969,-0.08097610412658443,-0.45338437108038904,-0.2952869863393396,-0.30959772365257004,0.6988768123463287,0.17049117199049213,3.2674649019757385,-0.8333373234944124,1.8462942520757128,-0.49441222531240125,-0.44187299748074166,-0.300810826687287]
from this -
doubleSeq.head(1)(0)(0)
gives
Any = [3.0,6.0,-0.7876947819954485,-0.21757635218517163,0.9731844373162398,-0.6641741696340383,-0.6860072219935377,-0.2990737363481845,-0.7075863760365155,0.8188108975549018,-0.8468559840943759,-0.04349947247406488,-0.45236764452589984,1.0333959313820456,0.6097566070878347,-0.7106619551471779,-0.7750330808435969,-0.08097610412658443,-0.45338437108038904,-0.2952869863393396,-0.30959772365257004,0.6988768123463287,0.17049117199049213,3.2674649019757385,-0.8333373234944124,1.8462942520757128,-0.49441222531240125,-0.44187299748074166,-0.300810826687287]
Which is not solving my problem
Scala Spark - split vector column into separate columns in a Spark DataFrame
Is not solving my issue but its an indicator
Upvotes: 1
Views: 1437
Reputation: 10406
So you want to extract a Vector from a Row, and turn it into an array of doubles.
The problem with your code is that the get
method (and the implicit apply
method you are using) returns an object of type Any
. Indeed, a Row
is a generic, unparametrized object and there is no way to now at compile time what types it contains. It's a bit like Lists in java 1.4 and before. To solve it in spark, you can use the getAs
method that you can parametrize with a type of your choosing.
In your situation, you seem to have a dataframe containing a vector (org.apache.spark.ml.linalg.Vector
).
import org.apache.spark.ml.linalg._
val firstRow = df.head(1)(0) // or simply df.head
val vect : Vector = firstRow.getAs[Vector](0)
// or all in one: df.head.getAs[Vector](0)
// to transform into a regular array
val array : Array[Double] = vect.toArray
Note also that you can access columns by name like this:
val vect : Vector = firstRow.getAs[Vector]("finalFeatures")
Upvotes: 3