Reputation: 77
I had an XLSX file with 2 columns namely months
and revenue
and saved it as a CSV file. By using pandas to read my csv file, the revenue
column has now turned into object. How can I change this column to float?
data = pd.DataFrame
dat['revenue']
7980.79
Nan
1000.25
17800.85
.....
Nan
2457.85
6789.33
This is the column I want to change but it has been given me different errors
I tried, astype
, to_numeric
but no success.
Some of the errors I got is:
Cannot parse a string '798.79'
Upvotes: 2
Views: 3235
Reputation: 2810
Now using nucsit026's answer to create a slightly different dataFrame with strings
dic = {'revenue':['7980.79',np.nan,'1000.25','17800.85','None','2457.85','6789.33']}
print(df)
print(df['revenue'].dtypes
Output:
revenue
0 7980.79
1 NaN
2 1000.25
3 17800.85
4 None
5 2457.85
6 6789.33
dtype('O')
try this:
df['revenue']=pd.to_numeric(data['revenue'], errors='coerce').fillna(0, downcast='infer')
it will replace nan
with 0s
Output:
0 7980.79
1 0.00
2 1000.25
3 17800.85
4 0.00
5 2457.85
6 6789.33
Name: revenue, dtype: float64
EDIT:
From your shared error if quotes are the problem you can use
df['revenue']=df['revenue'].str.strip("'")
and then try to convert to float using above mentioned code
EDIT2
OP had some spaces in the column values like this
Month Revenue
Apr-13 16 004 258.24
May-13
Jun-13 16 469 157.71
Jul-13 19 054 861.01
Aug-13 20 021 803.71
Sep-13 21 285 537.45
Oct-13 22 193 453.80
Nov-13 21 862 298.20
Dec-13 10 053 557.64
Jan-14 17 358 063.34
Feb-14 19 469 161.04
Mar-14 22 567 078.21
Apr-14 20 401 188.64
In this case use following code:
df['revenue']=df['revenue'].replace(' ', '', regex=True)
and then perform the conversion
Upvotes: 2
Reputation: 720
From above link:
dic = {'revenue':[7980.79,None,1000.25,17800.85,None,2457.85,6789.33]}
df = pd.DataFrame(dic)
df['revenue'] = df.revenue.astype(float)
df
output
revenue
0 7980.79
1 NaN
2 1000.25
3 17800.85
4 NaN
5 2457.85
6 6789.33
Upvotes: 0