Reputation: 159
Issue
I have an excel file in German format. It looks like this
I want to read the first column as numbers into pandas using the flowing code:
import pandas as pd
import numpy as np
tmp = pd.read_excel("test.xlsx", dtype = {"col1": np.float64})
It gives me the error
ValueError: Unable to convert column col1 to type <class 'numpy.float64'>
The issue is in excel. If I modify the col1
manuelly to number format, it solves the issue. See this new excel file:
Approach
I can first read col1
as object into pandas, then I need to replace ,
to .
, at the last I can change the string to float.
However
The approach is tedious. How can I solve this problem more efficiently?
Upvotes: 1
Views: 1309
Reputation: 9857
Unfortunately, there is no way to tell pandas what decimal separator is being used.
What you could do though is create a function to do the conversion and pass it to read_excel as part of the converters argument.
def fix_decimal(num):
### convert numeric value with comma as decimal separator to float
print(num)
return float(num.replace(',', '.')) if num else 0
tmp = pd.read_excel("test.xlsx", converters={0: fix_decimal} )
Upvotes: 2