Reputation: 728
I have a dataframe:
Name Section
1 James P3
2 Sam 2.5C
3 Billy T35
4 Sarah A85
5 Felix 5I
How do I split numeric values into a separate column called Section_Number and also split alphabetic values to Section_Letter. Desired results
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5L 5 L
Upvotes: 4
Views: 5503
Reputation: 294488
We can use itertools.groupby
to group the contiguous alpha and non-alpha
from itertools import groupby
[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
[['3', 'P'], ['2.5', 'C'], ['35', 'T'], ['85', 'A'], ['5', 'I']]
We can manipulate this into new columns
from itertools import groupby
N, L = zip(
*[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
)
df.assign(Selection_Number=N, Selection_Letter=L)
Name Section Selection_Number Selection_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
Upvotes: 0
Reputation: 164773
If, as in your example, you have one letter in each name, you can sort and then slice:
def get_vals(x):
return ''.join(sorted(x, key=str.isalpha))
# apply ordering
vals = df['Section'].apply(get_vals)
# split numbers from letter
df['num'] = vals.str[:-1].astype(float)
df['letter'] = vals.str[-1]
print(df)
Name Section num letter
1 James P3 3.0 P
2 Sam 2.5C 2.5 C
3 Billy T35 35.0 T
4 Sarah A85 85.0 A
5 Felix 5I 5.0 I
Upvotes: 1
Reputation: 142206
It'll no doubt be slower but throwing out an alternative for completeness you can use str.extractall
to get named groups matching the patterns and consolidate the matches and join back to your DF...
new = df.join(
df.Section.str.extractall(r'(?i)(?P<Section_Letter>[A-Z]+)|(?P<Section_Number>[\d.]+)')
.groupby(level=0).first()
)
Result:
Name Section Section_Letter Section_Number
1 James P3 P 3
2 Sam 2.5C C 2.5
3 Billy T35 T 35
4 Sarah A85 A 85
5 Felix 5I I 5
Upvotes: 1
Reputation: 863216
Use str.replace
with str.extract
by [A-Z]+
for all uppercase strings:
df['Section_Number'] = df['Section'].str.replace('([A-Z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
For seelct also lowercase values:
df['Section_Number'] = df['Section'].str.replace('([A-Za-z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Za-z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
Upvotes: 7