Deng-guy
Deng-guy

Reputation: 45

Pandas replacing a value in dataframe with conditions on string

I need a bit of help:

I have two columns: id and class:

df:

id      class
AB001   NaN
AB002   NaN
CDE001  NaN
CDE002  NaN

and what I'd like is that if id begins with AB, then class is AB,

but if

id begins with CDE, then class is CDE

so I'll end up with:

id      class
AB001   AB  
AB002   AB
CDE001  CDE
CDE002  CDE

I just can't get my head around it - can someone help? Thank you!

Upvotes: 1

Views: 118

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150745

It looks like you want to clip all the trailing digits, so:

df['class'] = df['id'].str.extract('^(\D+)')[0]

Output:

       id class
0   AB001    AB
1   AB002    AB
2  CDE001   CDE
3  CDE002   CDE

update: per your comment, you can use rstrip:

df['class'] = df['id'].str.rstrip('0123456789')

or still with extract:

df['class'] = df['id'].str.extract('^(.*\D)\d+$')[0]

Upvotes: 5

Rakesh
Rakesh

Reputation: 82765

Another approach using regex extract

Ex:

df = pd.DataFrame({"id":['AB001', 'AB002', 'CDE001', 'CDE002']})
df['Class'] = df['id'].str.extract(r"^([A-Z]+)")
print(df)

Output:

       id Class
0   AB001    AB
1   AB002    AB
2  CDE001   CDE
3  CDE002   CDE

Upvotes: 1

Related Questions