Balaji Krishnan
Balaji Krishnan

Reputation: 457

Spark Scala - Handling empty DataFrame

I have a specific requirement wherein, i need to check for empty DataFrame. If empty then populate a default value. Here is what i tried but not getting what i want.

def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame = 
 {
 if (!df.rdd.isEmpty())  df
    else
  df.na.fill(0, Seq(col))
 }

val age = checkNotEmpty(w_feature_md.filter("age='22'").select("age_index"),"age_index")

The idea is to get the df if it is not empty. If it is empty then fill in a default value of ZERO. This doesn't seem to work. The following is what i am getting.

scala> age.show
+---------+
|age_index|
+---------+
+---------+

Please help..

Upvotes: 2

Views: 4284

Answers (1)

Pawan B
Pawan B

Reputation: 4623

  def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame = 
     {
     if (!df.rdd.isEmpty())  df
        else
      df.na.fill(0, Seq(col))
     }

In your method :

control goes to if part if the df is not empty .

And goes to else part when df is empty .

df.na (org.apache.spark.sql.DataFrameNaFunctions) : Functionality for working with missing data in DataFrames.
Since you are using df.na on an empty dataframe , there is nothing to replace hence result is always empty.

Check this ques for more on replacing null values in df.

Upvotes: 2

Related Questions