user5844628
user5844628

Reputation: 419

Pandas: Get all ids that have more than one corresponding value

Starting with a DataFrame that looks like this:

userId | preference
1      | coffee
2      | cake 
2      | tea
3      | tea
3      | tea
3      | tea
4      | apple
4      | tea

I need to transform the above into this:

userId | preference
2      | cake 
2      | tea
4      | apple
4      | tea

Note userId 1 and userId 3 were dropped because they only had one unique preference. I would only want userIds that have 2 or more unique preferences to remain. I've been stuck on this. Tried using .grouby but getting nowhere

Upvotes: 0

Views: 623

Answers (2)

Naga kiran
Naga kiran

Reputation: 4607

Group wise filter would also work for your case it seems

df.groupby('userId').filter(lambda x: x['preference'].nunique()>1)

Out:

userId  |   preference
1   2   |   cake
2   2   |   tea
6   4   |   apple
7   4   |   tea

Upvotes: 1

sammywemmy
sammywemmy

Reputation: 28709

Get the count of unique values per userId, if it is greater than 1, keep, else discard.

df.loc[df.groupby('userId').preference.transform('nunique').gt(1)]
Out[26]: 
   userId preference
1       2       cake
2       2        tea
6       4      apple
7       4        tea

Upvotes: 2

Related Questions