Luckylukee
Luckylukee

Reputation: 595

Filling null values with the mean of the column in HiveQL and Spark

I am using HiveQL in spark and woul like to fill null values by the mean of the column in spark.

Using below codes:

    StringBuilder query = new StringBuilder("select `ts0` as ts ");
    String[] cols = dataFrame.columns();

    for (String col : cols) {
            query.append(",`" + col + "` as " + trimmedCol);
        }

    }

I think I should use "case" command when there is a null value. Can anyone guide me how to do above?

Upvotes: 0

Views: 94

Answers (1)

You could to try this following

scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("na_test.csv")


scala> df.show()

scala> df.na.fill(10.0,Seq("age"))


scala> df.na.fill(10.0,Seq("age")).show




scala> df.na.replace("age", Map(35 -> 61,24 -> 12))).show()

Upvotes: 1

Related Questions