Pandas replacing a value in dataframe with conditions on string

Question

I need a bit of help:

I have two columns: id and class:

df:

id      class
AB001   NaN
AB002   NaN
CDE001  NaN
CDE002  NaN

and what I'd like is that if id begins with AB, then class is AB,

but if

id begins with CDE, then class is CDE

so I'll end up with:

id      class
AB001   AB  
AB002   AB
CDE001  CDE
CDE002  CDE

I just can't get my head around it - can someone help? Thank you!

Quang Hoang · Accepted Answer

It looks like you want to clip all the trailing digits, so:

df['class'] = df['id'].str.extract('^(\D+)')[0]

Output:

       id class
0   AB001    AB
1   AB002    AB
2  CDE001   CDE
3  CDE002   CDE

update: per your comment, you can use rstrip:

df['class'] = df['id'].str.rstrip('0123456789')

or still with extract:

df['class'] = df['id'].str.extract('^(.*\D)\d+$')[0]

Answers (2)