Reputation: 17676
How can I convert a single column in spark 2.0.1 into an array?
+---+-----+
| id| dist|
+---+-----+
|1.0|2.0|
|2.0|4.0|
|3.0|6.0|
|4.0|8.0|
+---+-----+
should return Array(1.0, 2.0, 3.0, 4.0)
A
import scala.collection.JavaConverters._
df.select("id").collectAsList.asScala.toArray
fails with
java.lang.RuntimeException: Unsupported array type: [Lorg.apache.spark.sql.Row;
java.lang.RuntimeException: Unsupported array type: [Lorg.apache.spark.sql.Row;
Upvotes: 7
Views: 21763
Reputation: 5315
Why do you use JavaConverters if you then re-transform the Java List to a Scala List ? You just need to collect the dataset and then map this array of Rows to an array of doubles, like this :
df.select("id").collect.map(_.getDouble(0))
Upvotes: 9
Reputation: 35404
I'd try something like this with dataframe aggregate function - collect_list() to avoid memory overhead on the driver JVM. With this approach only selected column values will be copied to driver JVM.
df.select(collect_list("id")).first().getList[Double](0)
This returns java.util.List[Double]
.
Upvotes: 6