jason
jason

Reputation: 3962

pandas is not converting type string to float

I've tried replacing the ',' with blank field.

df['amount'].replace(',','', regex=True).astype(float)

Error:

ValueError: could not convert string to float: 

df['amount'] = df['amount'].astype('float64')

I still get the same error. The sample data looks like: 5,000.00 and 1,00,000.234 values.

How can I convert it to float?

Btw, I'm reading a json file! with only the path of the file.

Upvotes: 1

Views: 1282

Answers (3)

madan maram
madan maram

Reputation: 23

Our values like 5,000.00 would be converted to 5000.00 as a float:

df['Withdrawal Amt.'] = [float(str(i).replace(",", "")) for i in df['Withdrawal Amt.']]

Upvotes: 0

jezrael
jezrael

Reputation: 862641

I think need assign back:

df = pd.DataFrame({'amount':['5,000.00', '1,00,000.234']})

df['amount'] = df['amount'].replace(',','', regex=True).astype('float64')
print (df)
       amount
0    5000.000
1  100000.234

If does not work check if some bad values:

df = pd.DataFrame({'amount':['5,000.00', '1,00,000.234', 'a']})
print (df)
         amount
0      5,000.00
1  1,00,000.234
2             a

print (df.loc[pd.to_numeric(df['amount'].replace(',','', regex=True), errors='coerce').isnull(), 'amount'])
2    a
Name: amount, dtype: object

Then is possible convert bad values to NaNs:

df = pd.DataFrame({'amount':['5,000.00', '1,00,000.234', 'a']})
print (df)
         amount
0      5,000.00
1  1,00,000.234
2             a

df['amount'] = pd.to_numeric(df['amount'].replace(',','', regex=True), errors='coerce', downcast='float')
print (df)

       amount
0    5000.000
1  100000.234
2         NaN

If use pd.read_csv for DataFrame add parameter thousands=',':

df = pd.read_csv(file, thousands=',')

Upvotes: 1

jpp
jpp

Reputation: 164673

Using pandas.to_numeric with pd.Series.str.replace works for this:

s = pd.Series(['5,000.00', '1,00,000.234'])

s = pd.to_numeric(s.str.replace(',', ''), downcast='float')

print(s)

# 0      5000.000
# 1    100000.234
# dtype: float64

However, a better idea is to fix this at source, if possible. For example, pandas.read_csv has arguments which allow you to account for such numeric formatting.

Upvotes: 1

Related Questions