Reputation: 147
I have a dataframe with several columns. One of these is a map (MapType). The keys inside this map-column differ from row to row. This means something like this is possible:
+----------+-----------------+
| col_1| col_2|
+----------+-----------------+
| 7| key_1 -> value_1|
| 5| key_2 -> value_2|
| 4| key_3 -> value_3|
+----------+-----------------+
What i want to do is add the first column to this map-column to get something like:
+----------+-----------------------------+
| col_1| col_2|
+----------+-----------------------------+
| 7| key_1 -> value_1, col_1 -> 7|
| 5| key_2 -> value_2, col_1 -> 5|
| 4| key_3 -> value_3, col_1 -> 4|
+----------+-----------------------------+
But i cant figure out how to add the first column to the map while perserving the individual keys inside the map-column.
Upvotes: 0
Views: 100
Reputation: 15258
With version 2.4.0, you have access to a lot of new functions to manipulate map types.
Assuming df
is your dataframe :
from pyspark.sql import functions as F
df.withColumn(
"col_2",
F.map_concat(
F.col("col_2"),
F.map_from_entries(F.array(F.struct(F.col("col_1"))))
)
)
Upvotes: 1