Reputation: 141
I am trying to impute empty values by NA and the code is working fine using only Scala but when I am running the code in Spark then it is not working
/* first way:- */
def blankImputation(input: String): String = {
val pattern2 = """(^.*?,,+.*$)""".r;
if (pattern2.findFirstIn(input).contains(",,")) {
return pattern2.replaceAllIn(input, ",NA,");
}
return input;
}
var cleaned_df = inputFile.map(blankImputation)
/* second way:- */
def blankImputation(input: String): String = {
val pattern2 = """(^.*?,,+.*$)""".r;
if (input.isEmpty()) {
return "NA";
}
return input;
}
var cleaned_df = inputFile.map(blankImputation)
cleaned_df.toDF().collect()
I expect NA instead of Empty values.
Upvotes: 1
Views: 294
Reputation: 141
Thanks Shankar for your effort. I could able to impute the missing values after following the below steps:- 1. I loaded the csv file into dataframe. 2. After loading into the dataframe, the empty values got replaced by null so I imputed the null values using this code:
val nullReplacer = udf((x: String) => { if (x == null) "N" else x })
Upvotes: 1