Reputation: 783
I have a dataframe that have two columns: NO
and CODE
. I am trying to find NO
that contain two type of CODE
. How can I make it?
import pandas as pd
df = pd.DataFrame({'NO': ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd', 'e', 'e', 'e', 'f', 'f'], 'CODE': ['1', '1', '1', '1', '4', '1', '4', '1', '4', '4', '1', '4', '1']})
Out[1]:
NO CODE
0 a 1
1 a 1
2 b 1
3 b 1
4 c 4
5 c 1
6 d 4
7 d 1
8 e 4
9 e 4
10 e 1
11 f 4
12 f 1
NO CODE
c 4
c 1
d 4
d 1
e 4
e 4
e 1
f 4
f 1
Any help will be thankful.
Upvotes: 0
Views: 769
Reputation: 28644
Use transform and nunique to get number of unique values for CODE per NO.
Keep only rows that are greater than one and drop the transform column.
df['gt_1'] = df.groupby('NO').CODE.transform('nunique')
df.query('gt_1 > 1').drop('gt_1',axis=1).reset_index(drop=True)
NO CODE
0 c 4
1 c 1
2 d 4
3 d 1
4 e 4
5 e 4
6 e 1
7 f 4
8 f 1
Another option: Use the filter function in groupby to get only rows where unique values is greater than one, then use the iloc method to extract only rows that meet criteria.
check = df.groupby('NO').CODE.filter(lambda x:x.nunique()>1).index
df.iloc[check]
NO CODE
4 c 4
5 c 1
6 d 4
7 d 1
8 e 4
9 e 4
10 e 1
11 f 4
12 f 1
Upvotes: 1