Reading Excel file to Python fails due to number formatted as text

Question

I have a ton of Excel files that each have a column where numbers are formatted as text. Excel gives the error "the number in this cell is formatted as text or preceded by an apostrophe" - see the third column where the cells have a green triangle.

My goal is to open all of these files in Pandas without having to manually open each of them and convert the column to number. However, pd.read_excel() fails with the following xlrd error:

XLRDError: ZIP file contents not a known type of workbook

Unsurprisingly when I use xlrd directly: wb = xlrd.open_workbook(filename) I get the same error.

I also tried openpyxl: wb = openpyxl.load_workbook(filename), and it gives me this:

KeyError: "There is no item named 'xl/_rels/workbook.xml.rels' in the archive"

I confirmed that the file is openable by both pandas (xlrd) and openpyxl if I manually convert the column to number in excel and re-save the workbook.

Does anyone have any ideas?

murphy1310 · Accepted Answer

Use "converters" while reading excel.
Docs.

Eg.:

df = pd.read_excel('yourfile.xlsx',sheetname='sheetname',header=0,converters={ "% Chg" : str })

Reading Excel file to Python fails due to number formatted as text

Answers (1)

Related Questions