Nick01
Nick01

Reputation: 369

updating a map column in dataframe spark/scala

I have a dataframe origMap with a column that is of type map. I want to add more entries to that map

I'm doing following that's working:

val origMap = df("mapping")

val tempMap = tempDFFields.flatMap(tempField => Array(lit(tempField), tempDF(tempField))): _*)

origMap.withColumn("mapping", tempMap.union(origMap))

tempDFFields is list of column names in tempDF.

I'm creating a map of all colname->colvalue from tempDF and want to add it to original DF. It complains that I'm passing array of Column instead of single instance of Column. how can I pass single instance of column here .. I just want to update the map and store it back.

Example:

Input

origDF

+--------+-----------------------------
|id  | amount       | mapping         | 
|1   | 10           | {a=b, c=d}      |
|3   | 90           | {e=f, g=h}           |

tempDF

+-----
|Id |
|1  |

output: origDF

+--------+-----------------------------
|id  | amount       | mapping          | 
|1   | 10           | {a=b, c=d, id=1} |
|3   | 90           | {e=f, g=h, id=1} |

Upvotes: 1

Views: 1238

Answers (1)

koiralo
koiralo

Reputation: 23119

You can create an udf to merge the map as below

val origDF = Seq(
  (1, 10, Map("a" -> "b", "c" -> "d")),
  (3, 90, Map("e" -> "f", "g" -> "h"))
).toDF("id", "amount", "mapping")

If you have a single row DF you can create a map directly
val tmpDF = Map("id" -> "1")

//UDf to merge the two map 
val addToMap = udf((mapping: Map[String, String]) => {mapping ++ tmpDF})

//Use the udf 
origDF.withColumn("mapping", addToMap($"mapping"))

.show(false)

Output:

+---+------+----------------------------+
|id |amount|mapping                     |
+---+------+----------------------------+
|1  |10    |Map(a -> b, c -> d, id -> 1)|
|3  |90    |Map(e -> f, g -> h, id -> 1)|
+---+------+----------------------------+

Hope this helps!

Upvotes: 2

Related Questions