swchen
swchen

Reputation: 783

Find the row that contain multiple values in other column

Question

I have a dataframe that have two columns: NO and CODE. I am trying to find NO that contain two type of CODE. How can I make it?

The codes look like:

import pandas as pd

df = pd.DataFrame({'NO': ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd', 'e', 'e', 'e', 'f', 'f'], 'CODE': ['1', '1', '1', '1', '4', '1', '4', '1', '4', '4', '1', '4', '1']})

Out[1]: 
   NO CODE
0   a    1
1   a    1
2   b    1
3   b    1
4   c    4
5   c    1
6   d    4
7   d    1
8   e    4
9   e    4
10  e    1
11  f    4
12  f    1

Expected result

NO CODE
c  4
c  1
d  4
d  1
e  4
e  4
e  1
f  4
f  1

Any help will be thankful.

Upvotes: 0

Views: 769

Answers (1)

sammywemmy
sammywemmy

Reputation: 28644

Use transform and nunique to get number of unique values for CODE per NO.
Keep only rows that are greater than one and drop the transform column.

df['gt_1'] = df.groupby('NO').CODE.transform('nunique')

df.query('gt_1 > 1').drop('gt_1',axis=1).reset_index(drop=True)

    NO  CODE
0   c   4
1   c   1
2   d   4
3   d   1
4   e   4
5   e   4
6   e   1
7   f   4
8   f   1

Another option: Use the filter function in groupby to get only rows where unique values is greater than one, then use the iloc method to extract only rows that meet criteria.

check = df.groupby('NO').CODE.filter(lambda x:x.nunique()>1).index

df.iloc[check]

    NO  CODE
4   c   4
5   c   1
6   d   4
7   d   1
8   e   4
9   e   4
10  e   1
11  f   4
12  f   1

Upvotes: 1

Related Questions