Reputation: 919
I have a Data-frame with column "age" of type String,I want to change values in following form.
Input Values for example
Age
=====
0
null
NaN
999
200
35
25-30
45
null
NaN
35-40
======
Output Required
Age
=====
0
999
0
999
999
35
27
45
999
0
37
======
Code I have tried until now
val formatted_df1 = df.withColumn("age", regexp_replace(col("age"), "null", "999")) -- This will change the value from null to 999
val formatted_df2 = formatted_df1.withColumn("age", regexp_replace(col("age"), "NaN", "0")) -- This will change the value from NaN to 0
Upvotes: 0
Views: 643
Reputation: 23099
You can write a simple UDF function to get the result
val scrubUdf = udf((value : String ) => {
value match {
case "NaN" => 0
case "null" => 999
case null => 999
case x if x.contains("-") => {
// (value.split("-")(0).toInt + value.split("-")(1).toInt) / 2
x.split("-").map(x=> x.toInt).sum / 2
}
case x if x.toInt >= 200 => 999
case _ => value.toInt
}
})
You can call the udf
as
df.withColumn("newAge", scrubUdf($"Age"))
Hope this helps!
Upvotes: 2