Reputation: 65
I have the Dataset
in Spark with these schemas:
root
|-- from: struct (nullable = false)
| |-- id: string (nullable = true)
| |-- name: string (nullable = true)
| |-- tags: string (nullable = true)
|-- v1: struct (nullable = false)
| |-- id: string (nullable = true)
| |-- name: string (nullable = true)
| |-- tags: string (nullable = true)
|-- v2: struct (nullable = false)
| |-- id: string (nullable = true)
| |-- name: string (nullable = true)
| |-- tags: string (nullable = true)
|-- v3: struct (nullable = false)
| |-- id: string (nullable = true)
| |-- name: string (nullable = true)
| |-- tags: string (nullable = true)
|-- to: struct (nullable = false)
| |-- id: string (nullable = true)
| |-- name: string (nullable = true)
| |-- tags: string (nullable = true)
How to make the table(with only 3 columns id,name,tags) from this Dataset on Scala?
Upvotes: 0
Views: 106
Reputation: 330093
Just combine all the columns into an array
, explode
and select all nested fields:
import org.apache.spark.sql.functions.{array, col, explode}
case class Vertex(id: String, name: String, tags: String)
val df = Seq(((
Vertex("1", "from", "a"), Vertex("2", "V1", "b"), Vertex("3", "V2", "c"),
Vertex("4", "v3", "d"), Vertex("5", "to", "e")
)).toDF("from", "v1", "v2", "v3", "to")
df.select(explode(array(df.columns map col: _*)).alias("col")).select("col.*")
with the result as follows:
+---+----+----+
| id|name|tags|
+---+----+----+
| 1|from| a|
| 2| V1| b|
| 3| V2| c|
| 4| v3| d|
| 5| to| e|
+---+----+----+
Upvotes: 2