Reputation: 28630
I was able to work out how to split my column with numbers and letters, and have found solutions to split numbers and letters. But I have the problem of losing all the numbers that follow the leading numbers/digits and can not find a solution (and also still learning how to use regex).
Quick example:
import pandas as pd
import numpy as np
data = np.array([['Col1','Col2'],
['1','05MW'],
['2','16MW'],
['3','05SW1'],
['4','05SW2']])
df = pd.DataFrame(data=data[1:,:],
index=data[1:,0],
columns=data[0,:])
df[['Col2', 'id']] = df['Col2'].str.extract('(\d+)([A-Za-z]*)', expand=True)
Gives:
print (df)
Col1 Col2 id
1 1 05 MW
2 2 16 MW
3 3 05 SW
4 4 05 SW
However, I don't want to lose anything, including numbers, that follow the leading numbers. I am trying to achieve this output:
print (df)
Col1 Col2 id
1 1 05 MW
2 2 16 MW
3 3 05 SW1
4 4 05 SW2
Upvotes: 0
Views: 36
Reputation: 863226
Add 0-9
for parse also numbers:
df[['Col2', 'id']] = df['Col2'].str.extract('(\d+)([A-Za-z0-9]*)', expand=True)
Or use .*
for parse all values:
df[['Col2', 'id']] = df['Col2'].str.extract('(\d+)(.*)', expand=True)
print (df)
Col1 Col2 id
1 1 05 MW
2 2 16 MW
3 3 05 SW1
4 4 05 SW2
Upvotes: 2