Reputation: 419
Starting with a DataFrame that looks like this:
userId | preference
1 | coffee
2 | cake
2 | tea
3 | tea
3 | tea
3 | tea
4 | apple
4 | tea
I need to transform the above into this:
userId | preference
2 | cake
2 | tea
4 | apple
4 | tea
Note userId 1 and userId 3 were dropped because they only had one unique preference. I would only want userIds that have 2 or more unique preferences to remain. I've been stuck on this. Tried using .grouby
but getting nowhere
Upvotes: 0
Views: 623
Reputation: 4607
Group wise filter would also work for your case it seems
df.groupby('userId').filter(lambda x: x['preference'].nunique()>1)
Out:
userId | preference
1 2 | cake
2 2 | tea
6 4 | apple
7 4 | tea
Upvotes: 1
Reputation: 28709
Get the count of unique values per userId, if it is greater than 1, keep, else discard.
df.loc[df.groupby('userId').preference.transform('nunique').gt(1)]
Out[26]:
userId preference
1 2 cake
2 2 tea
6 4 apple
7 4 tea
Upvotes: 2