Ira Re
Ira Re

Reputation: 800

How to convert spark dataset to scala seq

I have the following case class

case class Station(id: Long, name: String) extends Node

and a Spark Dataset of stations

vertices: org.apache.spark.sql.Dataset[Station] = [id: bigint, name: string]

I would like to convert the vertices Dataset to a Seq[Station]. I found a lot of tutorials about how to create a Dataset from a sequence but not vice versa. Do you have any hint for me?

Upvotes: 2

Views: 6120

Answers (1)

Thomas Francois
Thomas Francois

Reputation: 866

You can use collect to convert the Dataset to an Array. You're free to then convert to Seq:

val verticesSeq: Seq[Station] = vertices.collect().toSeq

Use with caution though:

Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

Upvotes: 8

Related Questions