Reputation: 407
I need your help with a Spark/Pyspark question. I have a Spark DataFrame which looks like this. I want to group the dataframe by the name
column. How can I only keep those groups that contain at least one nickname
'X'?
df = pd.DataFrame({"name":["A", "A", "B" ,"B", "C", "C"],
"nickname":["X","Y","X","Z","Y", "Y"]}
this question has been answered for Pandas with the filter
function. However, Pyspark does not seem to support groupBy().filter()
.
Any ideas? Thank you very much.
Upvotes: 0
Views: 444
Reputation: 125
df = df.groupby('name','nickname').count().filter('Use condition which you want')
Upvotes: 1