Reputation: 800
I have the following case class
case class Station(id: Long, name: String) extends Node
and a Spark Dataset of stations
vertices: org.apache.spark.sql.Dataset[Station] = [id: bigint, name: string]
I would like to convert the vertices Dataset to a Seq[Station]. I found a lot of tutorials about how to create a Dataset from a sequence but not vice versa. Do you have any hint for me?
Upvotes: 2
Views: 6120
Reputation: 866
You can use collect
to convert the Dataset to an Array
. You're free to then convert to Seq
:
val verticesSeq: Seq[Station] = vertices.collect().toSeq
Use with caution though:
Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.
Upvotes: 8