Andy
Andy

Reputation: 738

Python Pandas Convert String to int/float

I have a pandas dataframe that I am trying to manipulate so that the 'amount' column converts from a string ("hh.hh 'hours'") to an int or float.

I am wondering what the best practice is in this situation. I have tried pd.to_numeric(), but without success. I think my problem is with the 'hours' tacked on the end of each string.

Is there another function that will identify numeric characters and simply ignore the 'hours' portion of the string, or do I first need to trim the last 5 characters before working with the built in dtype conversion function (pd.to_numeric)? Thanks!

           day  amount
2018-08-23  3   24.00 hours
2018-08-24  4   8.00 hours
2018-08-25  5   32.00 hours
2018-08-26  6   24.00 hours
2018-08-27  0   24.00 hours

Upvotes: 2

Views: 1787

Answers (2)

TheMaster
TheMaster

Reputation: 50426

Depending on the integrity of your data, You can use pd.to_timedelta to convert it to duration(timeDelta):

>>>df.amount=pd.to_timedelta(df.amount)
>>>df

    date      day   amount
0   2018-08-23  3   1 days 00:00:00
1   2018-08-24  4   0 days 08:00:00
2   2018-08-25  5   1 days 08:00:00
3   2018-08-26  6   1 days 00:00:00
4   2018-08-27  0   1 days 00:00:00

Upvotes: 2

ALollz
ALollz

Reputation: 59519

Just use the string methods to get only the numbers that matter. There are a lot of options available depending upon how messy or formatted your column is:

import pandas as pd

df['amount'] = pd.to_numeric(df.amount.str.replace('hours', ''), downcast='integer')
# or 
df['amount'] = pd.to_numeric(df.amount.str[:-5], downcast='integer')
# or
df['amount'] = pd.to_numeric(df.amount.str.extract('(\d+\.?\d*)')[0], downcast='integer')

All output:

            day  amount
2018-08-23    3      24
2018-08-24    4       8
2018-08-25    5      32
2018-08-26    6      24
2018-08-27    0      24

Upvotes: 2

Related Questions