Reputation: 159
I am trying to find a particular array (of type Double
) based on another column with minimum value. The following code works to extract the array but I am unable to receive it as Array[Double]
. Tried mapping and casting from as found from other threads but could not solve the problem. I would be grateful for any hints. The following is the illustration:
scala> df.show
+----+---------------+
|time| crds|
+----+---------------+
|12.0|[0.1, 2.1, 1.2]|
| 8.0|[1.1, 2.1, 3.2]|
| 9.0|[1.1, 1.1, 2.2]|
+----+---------------+
scala> val minTime = df.select(min(col("time"))).collect()(0)(0).toString.toDouble
minTime: Double = 8.0
scala> val crd = df.filter($"time" === minTime).select($"crds").take(1)
crd: Array[org.apache.spark.sql.Row] = Array([WrappedArray(1.1, 2.1, 3.2)])
scala> val res: Array[Double] = crd.array
<console>:29: error: type mismatch;
found : Array[org.apache.spark.sql.Row]
required: Array[Double]
val res: Array[Double] = crd.array
^
scala>
Upvotes: 1
Views: 330
Reputation: 18128
...
import scala.collection.mutable.WrappedArray
val crd = df.filter...select($"v").first.getAs[WrappedArray[Double]](0).toArray
Upvotes: 1
Reputation: 1144
May be a but cumbersome but works assuming there is only one hit for minimum.
scala> val df = Seq(
| (12.0, Array(0.1, 2.1, 1.2)),
| (8.0, Array(1.1, 2.1, 3.2)),
| (9.0, Array(1.1, 1.1, 2.2))
| ).toDF("time", "crds")
df: org.apache.spark.sql.DataFrame = [time: double, crds: array<double>]
scala> val minTime = df.select(min(col("time"))).collect()(0)(0).toString.toDouble
minTime: Double = 8.0
scala> val crd = df.filter($"time" === minTime).select(explode(col("crds"))).collect().map(i => i(0)).map(_.toString.toDouble)
crd: Array[Double] = Array(1.1, 2.1, 3.2)
scala>
Upvotes: 1