Qi Yin
Qi Yin

Reputation: 159

cannot convert string to numbers in pandas.read_excel

Issue

I have an excel file in German format. It looks like this enter image description here

I want to read the first column as numbers into pandas using the flowing code:

import pandas as pd
import numpy as np
tmp = pd.read_excel("test.xlsx", dtype = {"col1": np.float64})

It gives me the error

ValueError: Unable to convert column col1 to type <class 'numpy.float64'>

The issue is in excel. If I modify the col1 manuelly to number format, it solves the issue. See this new excel file: enter image description here

Approach

I can first read col1 as object into pandas, then I need to replace , to ., at the last I can change the string to float.

However

The approach is tedious. How can I solve this problem more efficiently?

Upvotes: 1

Views: 1309

Answers (1)

norie
norie

Reputation: 9857

Unfortunately, there is no way to tell pandas what decimal separator is being used.

What you could do though is create a function to do the conversion and pass it to read_excel as part of the converters argument.

def fix_decimal(num):
### convert numeric value with comma as decimal separator to float
  print(num)
  return float(num.replace(',', '.')) if num else 0
  
tmp = pd.read_excel("test.xlsx", converters={0: fix_decimal} )

Upvotes: 2

Related Questions