donald smith
donald smith

Reputation: 25

Match two columns in dataframe

I have two columns in dataframe df

  ID      Name
AXD2     SAM S
AXD2       SAM
SCA4       JIM
SCA4 JIM JONES
ASCQ      JOHN

I need the output to get a unique id and should match the first name only,

  ID  Name
AXD2 SAM S
SCA4   JIM
ASCQ  JOHN

Any suggestions?

Upvotes: 1

Views: 48

Answers (3)

ArchAngelPwn
ArchAngelPwn

Reputation: 3046

You can use cumcount() to find the first iteration name of the ID

df['RN'] = df.groupby(['ID']).cumcount() + 1
df = df.loc[df['RN'] == 1]
df[['ID', 'Name']]

Upvotes: 0

Onur Guven
Onur Guven

Reputation: 640

You can use groupby with agg and get first of Name

df.groupby(['ID']).agg(first_name=('Name', 'first')).reset_index()

Upvotes: 1

Corralien
Corralien

Reputation: 120391

Use drop_duplicates:

out = df.drop_duplicates('ID', ignore_index=True)
print(out)

# Output
     ID   Name
0  AXD2  SAM S
1  SCA4    JIM
2  ASCQ   JOHN

Upvotes: 0

Related Questions