Reputation: 45
I need a bit of help:
I have two columns: id and class:
df:
id class
AB001 NaN
AB002 NaN
CDE001 NaN
CDE002 NaN
and what I'd like is that if id begins with AB, then class is AB,
but if
id begins with CDE, then class is CDE
so I'll end up with:
id class
AB001 AB
AB002 AB
CDE001 CDE
CDE002 CDE
I just can't get my head around it - can someone help? Thank you!
Upvotes: 1
Views: 118
Reputation: 150745
It looks like you want to clip all the trailing digits, so:
df['class'] = df['id'].str.extract('^(\D+)')[0]
Output:
id class
0 AB001 AB
1 AB002 AB
2 CDE001 CDE
3 CDE002 CDE
update: per your comment, you can use rstrip
:
df['class'] = df['id'].str.rstrip('0123456789')
or still with extract
:
df['class'] = df['id'].str.extract('^(.*\D)\d+$')[0]
Upvotes: 5
Reputation: 82765
Another approach using regex extract
Ex:
df = pd.DataFrame({"id":['AB001', 'AB002', 'CDE001', 'CDE002']})
df['Class'] = df['id'].str.extract(r"^([A-Z]+)")
print(df)
Output:
id Class
0 AB001 AB
1 AB002 AB
2 CDE001 CDE
3 CDE002 CDE
Upvotes: 1