xinit
xinit

Reputation: 159

Extract Array[T] from Spark dataframe in Scala

I am trying to find a particular array (of type Double) based on another column with minimum value. The following code works to extract the array but I am unable to receive it as Array[Double]. Tried mapping and casting from as found from other threads but could not solve the problem. I would be grateful for any hints. The following is the illustration:

scala> df.show
+----+---------------+
|time|           crds|
+----+---------------+
|12.0|[0.1, 2.1, 1.2]|
| 8.0|[1.1, 2.1, 3.2]|
| 9.0|[1.1, 1.1, 2.2]|
+----+---------------+


scala> val minTime = df.select(min(col("time"))).collect()(0)(0).toString.toDouble
minTime: Double = 8.0

scala> val crd = df.filter($"time" === minTime).select($"crds").take(1)
crd: Array[org.apache.spark.sql.Row] = Array([WrappedArray(1.1, 2.1, 3.2)])

scala> val res: Array[Double] = crd.array
<console>:29: error: type mismatch;
 found   : Array[org.apache.spark.sql.Row]
 required: Array[Double]
   val res: Array[Double] = crd.array
                                ^

scala>

Upvotes: 1

Views: 330

Answers (2)

Ged
Ged

Reputation: 18128

...
import scala.collection.mutable.WrappedArray
val crd = df.filter...select($"v").first.getAs[WrappedArray[Double]](0).toArray

Upvotes: 1

Quiescent
Quiescent

Reputation: 1144

May be a but cumbersome but works assuming there is only one hit for minimum.

scala> val df = Seq(
     |    (12.0, Array(0.1, 2.1, 1.2)),
     |    (8.0, Array(1.1, 2.1, 3.2)),
     |    (9.0, Array(1.1, 1.1, 2.2))
     | ).toDF("time", "crds")
df: org.apache.spark.sql.DataFrame = [time: double, crds: array<double>]

scala> val minTime = df.select(min(col("time"))).collect()(0)(0).toString.toDouble
minTime: Double = 8.0

scala> val crd = df.filter($"time" === minTime).select(explode(col("crds"))).collect().map(i => i(0)).map(_.toString.toDouble)
crd: Array[Double] = Array(1.1, 2.1, 3.2)

scala>

Upvotes: 1

Related Questions