Bhavesh
Bhavesh

Reputation: 919

How to Split a Column in Data-frame and add the split values

I have a Data-frame with column "age" of type String,I want to change values in following form.

Input Values for example

Age
=====  
0
null
NaN
999
200
35
25-30
45
null
NaN
35-40
======

Output Required

Age
=====  
0
999
0
999
999
35
27
45
999
0
37
======

Code I have tried until now

val formatted_df1 = df.withColumn("age", regexp_replace(col("age"), "null", "999")) -- This will change the value from null to 999
val formatted_df2 = formatted_df1.withColumn("age", regexp_replace(col("age"), "NaN", "0")) -- This will change the value from NaN to 0

Upvotes: 0

Views: 643

Answers (1)

koiralo
koiralo

Reputation: 23099

You can write a simple UDF function to get the result

val scrubUdf = udf((value : String ) => {
  value match {
    case "NaN"  => 0
    case "null" => 999
    case null   => 999
    case x if x.contains("-") => {
      //          (value.split("-")(0).toInt + value.split("-")(1).toInt) / 2
      x.split("-").map(x=> x.toInt).sum / 2
    }
    case x if x.toInt >= 200 => 999
    case _ => value.toInt
  }
})

You can call the udf as

   df.withColumn("newAge", scrubUdf($"Age"))

Hope this helps!

Upvotes: 2

Related Questions