Reputation: 369
I need to update value of dataframe column based on a string that isn't part of any other column in the dataframe. How do I do this?
For e.g. Let's say my dataframe has column A, B, C. I want to update value of column C based on combination of value in column A & a static string. I tried to do the following.
val df = originalDF.withColumn("C", Helper.dudf(df("A"), lit("str")))
My helper class as following
val addDummyColumn :(String, String)=>String=(input:String, recordType: String)=>{input}
val dummyUDF = udf(addDummyColumn)
My UDF that takes in variable A & recordType:
if(recordType.equals("TRANSACTION") {
if(A > 0 ) return "CHARGE";
else return "REFUND"
} else if (recordType.equals("CHARGEBACK") {
return "CHARGEBACK"
}
Example Input & Output:
Sample Input:
A=10, recordType=TRANSACTION
Output: C = CHARGE
A=-10, recordType=TRANSACTION
C = REFUND
A=10, recordType=CHARGEBACK
C = CHARGEBACK
My problem is that withColumn only accepts Column so I did lit("str") but I don't know how to extract value of that column in my UDF. Ideas?
Upvotes: 0
Views: 1381
Reputation: 23119
This is how you can use udf
and pass the columns and static strings
val addDummy = udf((A : String, recordType: String) => {
if(recordType.equals("TRANSACTION")) {
if(A.toInt > 0 )
"CHARGE"
else
"REFUND"
}else if (recordType.equals("CHARGEBACK")) {
"CHARGEBACK"
}else
"NONE"
})
Now call the udf
as below
val newDF = df.withColumn("newCol", addDummy($"A", lit("TRANSACTION")))
Hope this helps!
Upvotes: 1
Reputation: 41987
If column A is a IntegerType then you can define the udf
function as
val recordType: String = //"TRANSACTION" or "CHARGEBACK"
import org.apache.spark.sql.functions._
val dummyUDF = udf((A: Int, recordType: String) => {
if(recordType.equals("TRANSACTION")){
if(A > 0) "CHARGE" else "REFUND"
} else if (recordType.equals("CHARGEBACK"))
"CHARGEBACK"
else
"not known"
})
val df = originalDF.withColumn("C", dummyUDF(originalDF("A"), lit(recordType)))
Upvotes: 1