Reputation: 457
I have a specific requirement wherein, i need to check for empty DataFrame. If empty then populate a default value. Here is what i tried but not getting what i want.
def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame =
{
if (!df.rdd.isEmpty()) df
else
df.na.fill(0, Seq(col))
}
val age = checkNotEmpty(w_feature_md.filter("age='22'").select("age_index"),"age_index")
The idea is to get the df if it is not empty. If it is empty then fill in a default value of ZERO. This doesn't seem to work. The following is what i am getting.
scala> age.show
+---------+
|age_index|
+---------+
+---------+
Please help..
Upvotes: 2
Views: 4284
Reputation: 4623
def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame =
{
if (!df.rdd.isEmpty()) df
else
df.na.fill(0, Seq(col))
}
In your method :
control goes to if
part if the df is not empty
.
And goes to else
part when df is empty
.
df.na (org.apache.spark.sql.DataFrameNaFunctions
) : Functionality for working with missing data in DataFrames.
Since you are using df.na on an empty dataframe
, there is nothing to replace hence result is always empty
.
Check this ques for more on replacing null values in df.
Upvotes: 2