Reputation: 995
How can I get the rows by distinct values in COL2
?
For example, I have the dataframe below:
COL1 COL2
a.com 22
b.com 45
c.com 34
e.com 45
f.com 56
g.com 22
h.com 45
I want to get the rows based on unique values in COL2
:
COL1 COL2
a.com 22
b.com 45
c.com 34
f.com 56
So, how can I get that? I would appreciate it very much if anyone can provide any help.
Upvotes: 76
Views: 66631
Reputation: 17834
You can use groupby
in combination with first
and last
methods.
To get the first row from each group:
df.groupby('COL2', as_index=False).first()
Output:
COL2 COL1
0 22 a.com
1 34 c.com
2 45 b.com
3 56 f.com
To get the last row from each group:
df.groupby('COL2', as_index=False).last()
Output:
COL2 COL1
0 22 g.com
1 34 c.com
2 45 h.com
3 56 f.com
Upvotes: 1
Reputation: 862741
Use drop_duplicates
with specifying column COL2
for check duplicates:
df = df.drop_duplicates('COL2')
#same as
#df = df.drop_duplicates('COL2', keep='first')
print (df)
COL1 COL2
0 a.com 22
1 b.com 45
2 c.com 34
4 f.com 56
You can also keep only last values:
df = df.drop_duplicates('COL2', keep='last')
print (df)
COL1 COL2
2 c.com 34
4 f.com 56
5 g.com 22
6 h.com 45
Or remove all duplicates:
df = df.drop_duplicates('COL2', keep=False)
print (df)
COL1 COL2
2 c.com 34
4 f.com 56
Upvotes: 106