Reputation: 547
I have a dataframe and I have a column which is a map in dataframe like this -
scala> df.printSchema
root
|-- A1: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
I need to select all the keys from dataframe as column name and values as rows.
For eg: Let say I have 2 records like this-
1. key1 -> value1, key2 -> value2, key3 -> value3 ....
2. key1 -> value11, key3 -> value13, key4 -> value14 ...
I want the output dataframe as
key1 key2 key3 key4
value1 value2 value3 null
value11 null value13 value14
How can I do this?
Upvotes: 2
Views: 4660
Reputation: 494
Assuming that column with the Map
is named "my_map"
val keys = df
.select(explode(expr("map_keys(my_map)")).as("keys_to_rows"))
.agg(collect_set("keys_to_rows"))
.collect()
.head.getSeq[String](0)
df.select(
keys.map(key => col(s"B.$key").as(key)): _*
)
Upvotes: 0
Reputation: 24198
First we need to create an id
column by which we can group your data, then explode
the map column A1
, and finally reshape your df
using pivot()
:
import org.apache.spark.sql.functions.{monotonically_increasing_id, explode, first}
df.withColumn("id", (monotonically_increasing_id()))
.select($"id", explode($"A1"))
.groupBy("id")
.pivot("key")
.agg(first("value")).show()
+---+-------+------+-------+-------+
| id| key1| key2| key3| key4|
+---+-------+------+-------+-------+
| 0| value1|value2| value3| null|
| 1|value11| null|value13|value14|
+---+-------+------+-------+-------+
Upvotes: 7