Reputation: 17144
I have a pandas series of various age ranges:
s = pd.Series([14,1524,2534,3544,65])
I would like to create a new series like this:
0 0-14
1 15-24
2 25-34
3 35-44
4 65+
I can do this using mapping:
s = pd.Series([14,1524,2534,3544,65])
age_map = {
14: '0-14',
1524: '15-24',
2534: '25-34',
3544: '35-44',
4554: '45-54',
5564: '55-64',
65: '65+'
}
s.map(age_map)
Also, using multiple regexes:
s = pd.Series([14,1524,2534,3544,65])
s = s.astype(str).str.replace(r'(\d\d)(\d\d)', r'\1-\2',regex=True)
s = s.astype(str).str.replace(r'14', r'0-14',regex=True)
s = s.astype(str).str.replace(r'65', r'65+',regex=True)
s
Question
Can we combine all three regexes into one advanced regex and obtain the same result?
something like:
s = pd.Series([14,1524,2534,3544,65])
pat = ''
pat_sub = ''
s = s.astype(str).str.replace(pat, pat_sub,regex=True)
s
Upvotes: 1
Views: 318
Reputation: 17144
I liked the answer of @coldspeed which is more flexible and function is reusable.
However, I came up with pandas chain operation like this:
s = s.astype(str).str.replace(r'14', r'0-14',regex=True)
.str.replace(r'65', r'65+',regex=True)
.str.replace(r'(\d\d)(\d\d)', r'\1-\2',regex=True))
s
Upvotes: 1
Reputation: 402463
You can use a single callback function to handle all the cases:
def parse_str(match):
a, b = match.groups()
if not b:
return f'0-{a}' if a == '14' else f'{a}+'
return f'{a}-{b}'
s.astype(str).str.replace(r'(\d{2})(\d{2})?', parse_str)
0 0-14
1 15-24
2 25-34
3 35-44
4 65+
dtype: object
This should work assuming your Series contains only either two or four digit numbers.
Upvotes: 3