Reputation: 1190
I want to replace the values of a given df column, using a hashmap but I am struggling with the syntax. Can someone please point me in the right direction or to an existing example? I have searched but not able to find something which sheds light on the exact subject.
Edit:
Imagine a dataframe like shown below:
+-----------+--------+-----------+
| Noun| Pronoun| Adjective|
+-----------+--------+-----------+
| Homer| Simpson|BeerDrinker|
| Marge| Simpson| Housewife|
| Bart| Simpson| Son|
| Lisa| Simpson| Daughter|
|TheSimpsons|Simpsons| Family|
+-----------+--------+-----------+
And I have a map of key-value pairs like shown below:
type ValueMap = scala.collection.mutable.HashMap [String,String]
var mymap = new ValueMap ()
mymap += ("Simpson" -> "Surname")
I want to do an operation (which I am unable to figure out as of yet) and achieve a result like shown below. So basically in the column Pronoun
, all the column values which equal Simpson
have been replaced by its corresponding value from the map mymap
which is Surname
+-----------+--------+-----------+
| Noun| Pronoun| Adjective|
+-----------+--------+-----------+
| Homer| Surname|BeerDrinker|
| Marge| Surname| Housewife|
| Bart| Surname| Son|
| Lisa| Surname| Daughter|
|TheSimpsons|Simpsons| Family|
+-----------+--------+-----------+
Upvotes: 1
Views: 1269
Reputation: 1023
Try this approach with UDF,
val myMap = Map("Simpson" -> "Surname")
val df = Seq(("Homer","Simpson","BeerDrinker"),("Marge","Simpson","Housewife"),("Bart","Simpson","Son"),("Lisa","Simpson","Daughter"),("TheSimpsons","Simpsons","Family")).toDF("Noun","Pronoun","Adjective")
df.show(false)
-----------+--------+-----------+
|Noun |Pronoun |Adjective |
+-----------+--------+-----------+
|Homer |Simpson |BeerDrinker|
|Marge |Simpson |Housewife |
|Bart |Simpson |Son |
|Lisa |Simpson |Daughter |
|TheSimpsons|Simpsons|Family |
+-----------+--------+-----------+
val getVal = udf((x: String) => myMap.getOrElse(x, x))
val resDF = df.withColumn("Pronoun", getVal($"Pronoun"))
resDF.show(false)
+-----------+--------+-----------+
|Noun |Pronoun |Adjective |
+-----------+--------+-----------+
|Homer |Surname |BeerDrinker|
|Marge |Surname |Housewife |
|Bart |Surname |Son |
|Lisa |Surname |Daughter |
|TheSimpsons|Simpsons|Family |
+-----------+--------+-----------+
Let me know if this helps.
Updated:
Without UDF,
Added the map to DF as one more column
val df1 = df.withColumn("map", typedLit(myMap))
val df2 = df1.withColumn("Pronoun", when($"map"($"Pronoun").isNotNull, $"map"($"Pronoun")).otherwise($"Pronoun") ).drop("map")
df2.show(false)
+-----------+--------+-----------+
|Noun |Pronoun |Adjective |
+-----------+--------+-----------+
|Homer |Surname |BeerDrinker|
|Marge |Surname |Housewife |
|Bart |Surname |Son |
|Lisa |Surname |Daughter |
|TheSimpsons|Simpsons|Family |
+-----------+--------+-----------+
Another simple way instead of adding new column,
val colMap = typedLit(myMap)
val df3 = df.withColumn("Pronoun", when(colMap($"Pronoun").isNotNull, colMap($"Pronoun")).otherwise($"Pronoun") )
df3.show(false)
Upvotes: 6