RAVI MISHRA
RAVI MISHRA

Reputation: 141

Empty value imputation in Spark using Scala

I am trying to impute empty values by NA and the code is working fine using only Scala but when I am running the code in Spark then it is not working

/* first way:- */
def blankImputation(input: String): String = {
    val pattern2 =  """(^.*?,,+.*$)""".r;
    if (pattern2.findFirstIn(input).contains(",,")) {
        return pattern2.replaceAllIn(input, ",NA,");
    }
    return input;
}

var cleaned_df = inputFile.map(blankImputation)


/* second way:- */
def blankImputation(input: String): String = {
    val pattern2 =  """(^.*?,,+.*$)""".r;
    if (input.isEmpty()) {
        return "NA";
    }
    return input;
}

var cleaned_df = inputFile.map(blankImputation)
cleaned_df.toDF().collect()

I expect NA instead of Empty values.

Upvotes: 1

Views: 294

Answers (1)

RAVI MISHRA
RAVI MISHRA

Reputation: 141

Thanks Shankar for your effort. I could able to impute the missing values after following the below steps:- 1. I loaded the csv file into dataframe. 2. After loading into the dataframe, the empty values got replaced by null so I imputed the null values using this code:

val nullReplacer = udf((x: String) => { if (x == null) "N" else x })

Upvotes: 1

Related Questions