Pyspark groupby and count null values

Question

PySpark Dataframe Groupby and Count Null Values

Referring to the solution link above, I am trying to apply the same logic but groupby("country") and getting the null count of another column, and I am getting a "column is not iterable" failure. Can someone help with this?

df7.groupby("country").agg(*(sum(col(c).isNull().cast("int")).alias(c) for c in columns))

Suyog Shimpi · Accepted Answer

covid_india_df.select(
    [
        funcs.count(
            funcs.when((funcs.isnan(clm) | funcs.col(clm).isNull()), clm)
        ).alias(clm) for clm in covid_india_df.columns
    ]
).show()

The above approach may help you to get correct results. Check here for a complete example.

Pyspark groupby and count null values

Answers (1)

Related Questions