Reputation: 445
I have a pandas df:
name time
1 a 1 year 2 months
2 b 4 years 1 month
3 c 3 years 1 month
I want to end with:
name years months
1 a 1 2
2 b 4 1
3 c 3 1
I can get as far as:
name time
1 a [1, 2]
2 b [4, 1]
3 c [3, 1]
but I can't figure out how to split the lists into columns.
Upvotes: 2
Views: 1116
Reputation: 109526
df = pd.DataFrame({'name': ['a', 'b', 'c'],
'time': ['1 year 2 months', '4 years 1 month', '3 years 1 month']})
# Split the time column and take the first and third elements to extract the values.
df[['years', 'months']] = df.time.str.split(expand=True).iloc[:, [0, 2]].astype(int)
>>> df
name time years months
0 a 1 year 2 months 1 2
1 b 4 years 1 month 4 1
2 c 3 years 1 month 3 1
You can use del df['time']
when you're ready to drop that column.
Upvotes: 4
Reputation: 31662
You could use str.findall
to find digits in your time columns and then with str.join
and str.split
you could get your result:
In [240]: df.time.str.findall('\d').str.join('_').str.split('_', expand=True)
Out[240]:
0 1
0 1 2
1 4 1
2 3 1
df[['years', 'months']] = df.time.str.findall('\d').str.join('_').str.split('_', expand=True)
In [245]: df
Out[245]:
name time years months
0 a 1 year 2 months 1 2
1 b 4 years 1 month 4 1
2 c 3 years 1 month 3 1
It's a bit faster then @Alexander's solution, and I think more general. From timing:
In [6]: %timeit df.time.str.split(expand=True).iloc[:, [0, 2]]
1000 loops, best of 3: 1.6 ms per loop
In [8]: %timeit df.time.str.findall('\d').str.join('_').str.split('_', expand=True)
1000 loops, best of 3: 1.43 ms per loop
Upvotes: 2