Reputation: 25
PySpark Dataframe Groupby and Count Null Values
Referring to the solution link above, I am trying to apply the same logic but groupby("country") and getting the null count of another column, and I am getting a "column is not iterable" failure. Can someone help with this?
df7.groupby("country").agg(*(sum(col(c).isNull().cast("int")).alias(c) for c in columns))
Upvotes: 1
Views: 1608
Reputation: 866
covid_india_df.select(
[
funcs.count(
funcs.when((funcs.isnan(clm) | funcs.col(clm).isNull()), clm)
).alias(clm) for clm in covid_india_df.columns
]
).show()
The above approach may help you to get correct results. Check here for a complete example.
Upvotes: 1