Knack
Knack

Reputation: 13

Change the column value if it is a certain string Spark Scala

I am trying to create an UDF function to replace some values in a DF. I have the following DF:

df1
+-------------+
| Periodicity |
+-------------+
|  Monthly    |
|  Daily      |
|  Annual     |
+-------------+

So if I find in this DF "Annual", I want to change it to "EveryYear" and if I find "Daily" to "EveryDay". This is what I am trying:

val modifyColumn = () => if (df1.col("Periodicity").equals("Annual")) "EveryYear"
val modifyColumnUDF = udf(modifyColumn)

val result = df1.withColumn("Periodicity", modifyColumnUDF(df1.col("Periodicity")))

But is giving me an EvaluateException. What am I doing wrong?

Upvotes: 0

Views: 371

Answers (1)

Nir Hedvat
Nir Hedvat

Reputation: 870

You can use one of these approaches:

// First approach 
    dataFrame
      .withColumn("Periodicity",
        when(col("Periodicity") === "Annual", "EveryYear").otherwise(
          when(col("Periodicity") === "Monthly", "EveryMonth").otherwise(
            when(col("Periodicity") === "Daily", "EveryDay"))))
    
// Second approach 
    val permutations = Map("Annual" -> "EveryYear", "Monthly" -> "EveryMonth", "Daily" -> "EveryDay")
    val medianUDF = udf[String, String]((origValue: String) => permutations(origValue))
    dataFrame.withColumn("Periodicity", medianUDF(col("Periodicity")))

The second approach can be used if you have many permutations and/or want it to be configured dynamically.

Upvotes: 1

Related Questions