Reputation: 25
I have two columns in dataframe df
ID Name
AXD2 SAM S
AXD2 SAM
SCA4 JIM
SCA4 JIM JONES
ASCQ JOHN
I need the output to get a unique id and should match the first name only,
ID Name
AXD2 SAM S
SCA4 JIM
ASCQ JOHN
Any suggestions?
Upvotes: 1
Views: 48
Reputation: 3046
You can use cumcount() to find the first iteration name of the ID
df['RN'] = df.groupby(['ID']).cumcount() + 1
df = df.loc[df['RN'] == 1]
df[['ID', 'Name']]
Upvotes: 0
Reputation: 640
You can use groupby
with agg
and get first
of Name
df.groupby(['ID']).agg(first_name=('Name', 'first')).reset_index()
Upvotes: 1
Reputation: 120391
Use drop_duplicates
:
out = df.drop_duplicates('ID', ignore_index=True)
print(out)
# Output
ID Name
0 AXD2 SAM S
1 SCA4 JIM
2 ASCQ JOHN
Upvotes: 0