read_excel function changes some characters to unicode ones since pandas package upgrade

Question

I have upgraded pandas package, new version is : 1.4.2 and xlrd package, new version is 2.0.1

Now, when I read python file with the following command :

import pandas as pd pd.read_excel('myfile.xlsx')

I got the following warning: UserWarning: Workbook contains no default style, apply openpyxl's default

And my result is :

foo	bar
_x0031_01259	COMMUNAUTE_x0020_DE_x0020_COMMUNES_x0020_FIER_x0020_ET_x0020_USSES

While it should be :

foo	bar
101259	COMMUNAUTE DE COMMUNES FIER ET USSES

So in some columns, 1 is replaced by x0031, 2 is replaced by x0033, space is replaced by x0020, etc.

I tried to add engine parameter set to openpyxl but same warning message and same dataframe result.

Before Pandas package upgrade, I already had the problem but with engine parameter set to xlrd, it was working (but I had a warning saying that newest version won't support xlrd)

Any idea how to read correctly the file?

read_excel function changes some characters to unicode ones since pandas package upgrade

Answers (1)

Related Questions