Reputation: 103
I have been stuck on this seemingly simple problem for hours. I would like to convert the following strings to minutes. (Or hours and minutes if I could).
foo['stringtime'] = pd.Series(['1 hour and 59 minutes','2 hours', np.nan, '38 minutes', '4 hours and 31 minutes'])
#What I've tried:
foo['stringtime'] = foo['stringtime'].str.replace(r'hours?','').str.replace(' minutes','').str.split(' and ')
However this would create a situation where '2 hours'
and '38 minutes'
become ['2']
and ['38']
#What I would like to happen:
foo.head()
output:
119
120
NaN (or 0)
38
271
Is there any beautiful elegant pythonic way to do this?
Upvotes: 0
Views: 187
Reputation: 1441
Another way might be just to use numexpr
to evaluate a numerical equation:
import numexpr
foo = pd.Series(['1 hour and 59 minutes','2 hours', np.nan, '38 minutes', '4 hours and 31 minutes'])
(foo.str.replace(r' hours?','*60').str.replace(' minutes','').str.replace(' and ', '+')
.fillna('0').apply(numexpr.evaluate))
Output:
0 119
1 120
2 0
3 38
4 271
Upvotes: 1
Reputation: 82765
Try Using Regex.
Ex:
import re
def p_time(val):
try:
t = 0
h = re.search(r"(\d+) hour(s)?", val)
if h:
t += int(h.group(1)) * 60
m = re.search(r"(\d+) minute(s)?", val)
if m:
t += int(m.group(1))
return t
except:
pass
return 0
s = pd.Series(['1 hour and 59 minutes','2 hours', np.nan, '38 minutes', '4 hours and 31 minute'])
print(s.apply(p_time).astype(int))
Output:
0 119
1 120
2 0
3 38
4 271
dtype: int32
Upvotes: 1