Reputation: 3173
I have a sample database with one column:
import pandas as pd
d = {
'question#': ['a1.2','a10','a10.1','b11.1a','k20.3d','b20c']
}
df = pd.DataFrame(d)
it looks like this:
Out[8]:
question#
0 a1.2
1 a10
2 a10.1
3 b11.1a
4 k20.3d
5 b20c
there isn't any way to sort number and letter mixed column correctly, so i thought the only way to do that is to first split the column into 3 columns:
first column: a letter: (a-z), the string always starts with one letter
second column: two possible outcomes:
single digit or multiple digits: (1-9)+
OR
digits + '.' + digits: (1-9)+(/.)(1-9)+
third column: a letter or nothing: (a-z)?
so for the sample database i want it to split into the following columns, DESIRED OUTPUT:
Out[8]:
question# firstcol secondcol thirdcol
0 a 1.2
1 a 10
2 a 10.1
3 b 11.1 a
4 k 20.3 d
5 b 20 c
is the syntax something like this page? i am not sure how exactly to write the regex syntax:
https://chrisalbon.com/python/pandas_regex_to_create_columns.html
df['firstcol'] = df['question#'].str.extract(not sure the syntax, expand=True)
df['secondcol'] = df['question#'].str.extract(not sure the syntax, expand=True)
df['thirdcol'] = df['question#'].str.extract(not sure the syntax, expand=True)
Upvotes: 2
Views: 4050
Reputation: 38415
Try
df[['firstcol', 'secondcol', 'thirdcol']] = df['question#'].str.extract('([A-Za-z]+)(\d+\.?\d*)([A-Za-z]*)', expand = True)
question# firstcol secondcol thirdcol
0 a1.2 a 1.2
1 a10 a 10
2 a10.1 a 10.1
3 b11.1a b 11.1 a
4 k20.3d k 20.3 d
5 b20c b 20 c
Upvotes: 2