Reputation: 738
I have a pandas dataframe that I am trying to manipulate so that the 'amount' column converts from a string ("hh.hh 'hours'") to an int or float.
I am wondering what the best practice is in this situation. I have tried pd.to_numeric(), but without success. I think my problem is with the 'hours' tacked on the end of each string.
Is there another function that will identify numeric characters and simply ignore the 'hours' portion of the string, or do I first need to trim the last 5 characters before working with the built in dtype conversion function (pd.to_numeric)? Thanks!
day amount
2018-08-23 3 24.00 hours
2018-08-24 4 8.00 hours
2018-08-25 5 32.00 hours
2018-08-26 6 24.00 hours
2018-08-27 0 24.00 hours
Upvotes: 2
Views: 1787
Reputation: 50426
Depending on the integrity of your data, You can use pd.to_timedelta
to convert it to duration(timeDelta):
>>>df.amount=pd.to_timedelta(df.amount)
>>>df
date day amount
0 2018-08-23 3 1 days 00:00:00
1 2018-08-24 4 0 days 08:00:00
2 2018-08-25 5 1 days 08:00:00
3 2018-08-26 6 1 days 00:00:00
4 2018-08-27 0 1 days 00:00:00
Upvotes: 2
Reputation: 59519
Just use the string methods to get only the numbers that matter. There are a lot of options available depending upon how messy or formatted your column is:
import pandas as pd
df['amount'] = pd.to_numeric(df.amount.str.replace('hours', ''), downcast='integer')
# or
df['amount'] = pd.to_numeric(df.amount.str[:-5], downcast='integer')
# or
df['amount'] = pd.to_numeric(df.amount.str.extract('(\d+\.?\d*)')[0], downcast='integer')
All output:
day amount
2018-08-23 3 24
2018-08-24 4 8
2018-08-25 5 32
2018-08-26 6 24
2018-08-27 0 24
Upvotes: 2