Reputation: 6465
I have a Dataframe in which some columns are of type String and contain NULL as a String value (not as actual NULL). I want to impute them with zero. apparently df.na.fill(0)
doesn't work. How can I impute them with zero?
Upvotes: 1
Views: 2057
Reputation: 24178
You can use replace()
from DataFrameNaFunctions
, these can be accessed by the prefix .na
:
val df1 = df.na.replace("*", Map("NULL" -> "0"))
You could also create your own udf
that replicates this behaviour:
import org.apache.spark.sql.functions.col
val nullReplacer = udf((x: String) => {
if (x == "NULL") "0"
else x
})
val df1 = df.select(df.columns.map(c => nullReplacer(col(c)).alias(c)): _*)
However this would be superfluous given it does the same as the above, at the cost of more lines of code than necessary.
Upvotes: 1