Reputation: 101
Column 'Amount' is a string. I want to change it to float so that I can input these rows into a later calculation.
In [1] import pandas as pd
data = pd.read_csv('input.csv')
data
Out [1]
ID Amount Cost
0 A 9,596,249.09 1000000
1 B 38,385,668.57 50000
2 C 351,740.00 100
3 D - 23
4 E 178,255.96 999
Note the that 'D' has an Amount of ' - ' rather than zero.
First I clean up the bad data:
In [2]
data['Amount'] = data['Amount'].replace(' - ', 0)
data
Out [2]
ID Amount Cost
0 A 9,596,249.09 1000000
1 B 38,385,668.57 50000
2 C 351,740.00 100
3 D 0 23
4 E 178,255.96 999
Then I try to convert to float using 2 methods. Both unsuccessful:
In [3]
pd.Series(data['Amount']).astype(float)
Out [3]
ValueError: could not convert string to float: '9,596,249.09'
and:
In [4]
pd.to_numeric(data['Amount'])
Out [4]
ValueError: Unable to parse string "9,596,249.09" at position 0
In my desperation I attempt to loop through the rows:
In [5]
def cleandata(x):
return float(x)
data['Amount'] = data['Amount'].apply(cleandata)
Out [5]
ValueError: could not convert string to float: '9,596,249.09'
Appreciate any advice you could give. I have tried for hours. Thank you.
Upvotes: 1
Views: 268
Reputation: 101
Creating a list (y) seems to work.
In [1]:
import pandas as pd
data = pd.read_csv('input.csv')
y = list(data["Amount"])
y = [item.replace(" - " , '0') for item in y]
y = [item.replace("," , '') for item in y]
data["Amount"] = y
data["Amount"] = pd.to_numeric(data['Amount'], errors='coerce')
data['Result'] = data["Amount"] - data["Cost"]
data
Out [1]:
ID Amount Cost Result
0 A 9596249.09 1000000 8596249.09
1 B 38385668.57 50000 38335668.57
2 C 351740.00 100 351640.00
3 D 0.00 23 -23.00
4 E 178255.96 999 177256.9
There is certainly a better and more pythonic way to write this^ i'm sure.
Upvotes: 0
Reputation: 13
You should get rid of the commas, that way should fix the problem. Try this:
data['Amount'] = data['Amount'].apply(lambda x: x.replace(",", "")) # take the commas away
data['Amount'] = data.Amount.astype(float)
Upvotes: 1
Reputation: 150765
try:
data = pd.read_csv('input.csv', thousands=',', decimal='.')
Upvotes: 2