Arek
Arek

Reputation: 25

Pyspark groupby and count null values

PySpark Dataframe Groupby and Count Null Values

Referring to the solution link above, I am trying to apply the same logic but groupby("country") and getting the null count of another column, and I am getting a "column is not iterable" failure. Can someone help with this?

df7.groupby("country").agg(*(sum(col(c).isNull().cast("int")).alias(c) for c in columns))

Upvotes: 1

Views: 1608

Answers (1)

Suyog Shimpi
Suyog Shimpi

Reputation: 866

covid_india_df.select(
    [
        funcs.count(
            funcs.when((funcs.isnan(clm) | funcs.col(clm).isNull()), clm)
        ).alias(clm) for clm in covid_india_df.columns
    ]
).show()

The above approach may help you to get correct results. Check here for a complete example.

Upvotes: 1

Related Questions