Rory
Rory

Reputation: 383

Split the map into two columns pyspark

I have a dataframe with a map:

sdf = spark.createDataFrame(
    [
        (1,  {'Kira':25,'Lilly':15}),  
        (2, {'Tom':14}),
    ],
    ["id", "label"]  
)
+---+-------------------------+
|id |label                    |
+---+-------------------------+
|1  |{Lilly -> 15, Kira -> 25}|
|2  |{Tom -> 14}              |
+---+-------------------------+

And I want to put the keys in one column and the values in another, like this:

+---+-----+---+
|id |name |age|
+---+-----+---+
|1  |Kira |25 |
|1  |Lilly|15 |
|2  |Tom  |14 |
+---+-----+---+

Upvotes: 0

Views: 343

Answers (1)

wwnde
wwnde

Reputation: 26676

Long hand. Use map collection functions to create name and age colunms. Leverage inline function to explode

sdf.withColumn('name',map_keys('label')).withColumn('age', map_values('label')).selectExpr('id','inline(arrays_zip(name,age))').show()

+---+-----+---+
| id| name|age|
+---+-----+---+
|  1|Lilly| 15|
|  1| Kira| 25|
|  2|  Tom| 14|
+---+-----+---+

Upvotes: 1

Related Questions