wkzhu
wkzhu

Reputation: 1660

Reading Excel file to Python fails due to number formatted as text

I have a ton of Excel files that each have a column where numbers are formatted as text. Excel gives the error "the number in this cell is formatted as text or preceded by an apostrophe" - see the third column where the cells have a green triangle.

enter image description here

My goal is to open all of these files in Pandas without having to manually open each of them and convert the column to number. However, pd.read_excel() fails with the following xlrd error:

XLRDError: ZIP file contents not a known type of workbook

Unsurprisingly when I use xlrd directly: wb = xlrd.open_workbook(filename) I get the same error.

I also tried openpyxl: wb = openpyxl.load_workbook(filename), and it gives me this:

KeyError: "There is no item named 'xl/_rels/workbook.xml.rels' in the archive"

I confirmed that the file is openable by both pandas (xlrd) and openpyxl if I manually convert the column to number in excel and re-save the workbook.

Does anyone have any ideas?

Upvotes: 5

Views: 5450

Answers (1)

murphy1310
murphy1310

Reputation: 667

Use "converters" while reading excel.
Docs.

Eg.:

df = pd.read_excel('yourfile.xlsx',sheetname='sheetname',header=0,converters={ "% Chg" : str })

Upvotes: 3

Related Questions