Julia K
Julia K

Reputation: 407

Pyspark / Spark: Drop groups that don't contain a certain value

I need your help with a Spark/Pyspark question. I have a Spark DataFrame which looks like this. I want to group the dataframe by the name column. How can I only keep those groups that contain at least one nickname 'X'?

df = pd.DataFrame({"name":["A", "A", "B" ,"B", "C", "C"],
                   "nickname":["X","Y","X","Z","Y", "Y"]}

this question has been answered for Pandas with the filter function. However, Pyspark does not seem to support groupBy().filter().

Any ideas? Thank you very much.

Upvotes: 0

Views: 444

Answers (1)

aamirmalik124
aamirmalik124

Reputation: 125

df = df.groupby('name','nickname').count().filter('Use condition which you want')

Upvotes: 1

Related Questions