madsthaks
madsthaks

Reputation: 2181

extracting numerical information from strings in a dataframe column

I've seen this done in excel but I'd like to split the SOP and number into different columns. It gets a little tricky since the formatting is different at times.

0   SOP-015641
1   SOP-007809
2   SOP018262
3   SOP-007802
4   SOP-007804
5   SOP-007807

Upvotes: 2

Views: 45

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210972

use .str.extract() method:

In [8]: df[['a','b']] = df.pop('col').str.extract('(\D+)(\d+)', expand=True)

In [9]: df
Out[9]:
      a       b
0  SOP-  015641
1  SOP-  007809
2   SOP  018262
3  SOP-  007802
4  SOP-  007804
5  SOP-  007807

RegEx explained

Upvotes: 2

Related Questions