Add column to existing MapType column

Question

I have a dataframe with several columns. One of these is a map (MapType). The keys inside this map-column differ from row to row. This means something like this is possible:

+----------+-----------------+
|     col_1|            col_2|
+----------+-----------------+
|         7| key_1 -> value_1|
|         5| key_2 -> value_2|
|         4| key_3 -> value_3|
+----------+-----------------+

What i want to do is add the first column to this map-column to get something like:

+----------+-----------------------------+
|     col_1|                        col_2|
+----------+-----------------------------+
|         7| key_1 -> value_1, col_1 -> 7|
|         5| key_2 -> value_2, col_1 -> 5|
|         4| key_3 -> value_3, col_1 -> 4|
+----------+-----------------------------+

But i cant figure out how to add the first column to the map while perserving the individual keys inside the map-column.

Steven · Accepted Answer

With version 2.4.0, you have access to a lot of new functions to manipulate map types.

Assuming df is your dataframe :

from pyspark.sql import functions as F

df.withColumn(
    "col_2",
    F.map_concat(
        F.col("col_2"),
        F.map_from_entries(F.array(F.struct(F.col("col_1"))))
    )
)

Add column to existing MapType column

Answers (1)

Related Questions