Reputation: 567
I have a dataframe that looks like -
ID NAME AGE
1 ABC 10
2 XYZ 12
3 PQR 20
4 ABC 25
5 XYZ 30
6 PQR 20
7 KLM 22
8 NOP 16
I would like to re-arrange and keep only the following rows based on the duplicate values on NAME
column -
ID NAME AGE
1 ABC 10
2 ABC 25
3 XYZ 12
4 XYZ 30
5 PQR 20
6 PQR 20
with following rows removed -
7 KLM 22
8 NOP 16
Thanks!
Upvotes: 1
Views: 113
Reputation: 1614
While the most simplest way is the df.duplicated
solution, here is another take on this using groupby
and transform
df[(df.groupby('NAME')['AGE'].transform('size')>1)].sort_values('NAME')
Tried performance comparison with df.duplicated
for curiosity and it about the same for both.
df=pd.concat([df]*1000000)
%timeit df1 = df[(df.groupby('NAME')['AGE'].transform('size')>1)].sort_values('NAME')
15.8 s ± 475 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df1 = df[df.duplicated(subset=['NAME'], keep=False)].sort_values(by='NAME')
15.9 s ± 632 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 0
Reputation: 26676
Please try
df[df.duplicated(subset=['NAME'], keep=False)].sort_values(by='NAME')
Upvotes: 2