chitown88
chitown88

Reputation: 28630

Split Column at leading numbers only

I was able to work out how to split my column with numbers and letters, and have found solutions to split numbers and letters. But I have the problem of losing all the numbers that follow the leading numbers/digits and can not find a solution (and also still learning how to use regex).

Quick example:

import pandas as pd
import numpy as np

data = np.array([['Col1','Col2'],
                ['1','05MW'],
                ['2','16MW'],
                ['3','05SW1'],
                ['4','05SW2']])

df = pd.DataFrame(data=data[1:,:],
                  index=data[1:,0],
                  columns=data[0,:])

df[['Col2', 'id']] = df['Col2'].str.extract('(\d+)([A-Za-z]*)', expand=True)

Gives:

print (df)
  Col1 Col2  id
1    1   05  MW
2    2   16  MW
3    3   05  SW
4    4   05  SW

However, I don't want to lose anything, including numbers, that follow the leading numbers. I am trying to achieve this output:

print (df)
  Col1 Col2  id
1    1   05  MW
2    2   16  MW
3    3   05  SW1
4    4   05  SW2

Upvotes: 0

Views: 36

Answers (1)

jezrael
jezrael

Reputation: 863226

Add 0-9 for parse also numbers:

df[['Col2', 'id']] = df['Col2'].str.extract('(\d+)([A-Za-z0-9]*)', expand=True)

Or use .* for parse all values:

df[['Col2', 'id']] = df['Col2'].str.extract('(\d+)(.*)', expand=True)

print (df)
  Col1 Col2   id
1    1   05   MW
2    2   16   MW
3    3   05  SW1
4    4   05  SW2

Upvotes: 2

Related Questions