HHH
HHH

Reputation: 6465

How to impute NULL values to zero in Spark/Scala

I have a Dataframe in which some columns are of type String and contain NULL as a String value (not as actual NULL). I want to impute them with zero. apparently df.na.fill(0) doesn't work. How can I impute them with zero?

Upvotes: 1

Views: 2057

Answers (1)

mtoto
mtoto

Reputation: 24178

You can use replace() from DataFrameNaFunctions, these can be accessed by the prefix .na:

val df1 = df.na.replace("*", Map("NULL" -> "0"))

You could also create your own udf that replicates this behaviour:

import org.apache.spark.sql.functions.col

val nullReplacer = udf((x: String) => {
  if (x == "NULL") "0"
  else x
})

val df1 = df.select(df.columns.map(c => nullReplacer(col(c)).alias(c)): _*)

However this would be superfluous given it does the same as the above, at the cost of more lines of code than necessary.

Upvotes: 1

Related Questions