User9102d82
User9102d82

Reputation: 1190

how to update/transform/replace spark df column values using a hashmap

I want to replace the values of a given df column, using a hashmap but I am struggling with the syntax. Can someone please point me in the right direction or to an existing example? I have searched but not able to find something which sheds light on the exact subject.

Edit:

Imagine a dataframe like shown below:

+-----------+--------+-----------+
|       Noun| Pronoun|  Adjective|
+-----------+--------+-----------+
|      Homer| Simpson|BeerDrinker|
|      Marge| Simpson|  Housewife|
|       Bart| Simpson|        Son|
|       Lisa| Simpson|   Daughter|
|TheSimpsons|Simpsons|     Family|
+-----------+--------+-----------+

And I have a map of key-value pairs like shown below:

  type ValueMap = scala.collection.mutable.HashMap [String,String]
  var mymap = new ValueMap ()
  mymap += ("Simpson" -> "Surname")

I want to do an operation (which I am unable to figure out as of yet) and achieve a result like shown below. So basically in the column Pronoun, all the column values which equal Simpson have been replaced by its corresponding value from the map mymap which is Surname

+-----------+--------+-----------+
|       Noun| Pronoun|  Adjective|
+-----------+--------+-----------+
|      Homer| Surname|BeerDrinker|
|      Marge| Surname|  Housewife|
|       Bart| Surname|        Son|
|       Lisa| Surname|   Daughter|
|TheSimpsons|Simpsons|     Family|
+-----------+--------+-----------+

Upvotes: 1

Views: 1269

Answers (1)

Sathiyan S
Sathiyan S

Reputation: 1023

Try this approach with UDF,

val myMap = Map("Simpson" -> "Surname")
val df = Seq(("Homer","Simpson","BeerDrinker"),("Marge","Simpson","Housewife"),("Bart","Simpson","Son"),("Lisa","Simpson","Daughter"),("TheSimpsons","Simpsons","Family")).toDF("Noun","Pronoun","Adjective")

df.show(false)

-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Simpson |BeerDrinker|
|Marge      |Simpson |Housewife  |
|Bart       |Simpson |Son        |
|Lisa       |Simpson |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

val getVal = udf((x: String) => myMap.getOrElse(x, x))
val resDF = df.withColumn("Pronoun", getVal($"Pronoun"))

resDF.show(false)

+-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Surname |BeerDrinker|
|Marge      |Surname |Housewife  |
|Bart       |Surname |Son        |
|Lisa       |Surname |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

Let me know if this helps.

Updated:

Without UDF,

Added the map to DF as one more column

val df1 = df.withColumn("map", typedLit(myMap))
val df2 = df1.withColumn("Pronoun", when($"map"($"Pronoun").isNotNull, $"map"($"Pronoun")).otherwise($"Pronoun") ).drop("map")
df2.show(false)

+-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Surname |BeerDrinker|
|Marge      |Surname |Housewife  |
|Bart       |Surname |Son        |
|Lisa       |Surname |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

Another simple way instead of adding new column,

val colMap = typedLit(myMap)
val df3 = df.withColumn("Pronoun", when(colMap($"Pronoun").isNotNull, colMap($"Pronoun")).otherwise($"Pronoun") )
df3.show(false)

Upvotes: 6

Related Questions