Reputation: 1660
I have a ton of Excel files that each have a column where numbers are formatted as text. Excel gives the error "the number in this cell is formatted as text or preceded by an apostrophe" - see the third column where the cells have a green triangle.
My goal is to open all of these files in Pandas without having to manually open each of them and convert the column to number. However, pd.read_excel() fails with the following xlrd
error:
XLRDError: ZIP file contents not a known type of workbook
Unsurprisingly when I use xlrd
directly: wb = xlrd.open_workbook(filename)
I get the same error.
I also tried openpyxl: wb = openpyxl.load_workbook(filename)
, and it gives me this:
KeyError: "There is no item named 'xl/_rels/workbook.xml.rels' in the archive"
I confirmed that the file is openable by both pandas (xlrd) and openpyxl if I manually convert the column to number in excel and re-save the workbook.
Does anyone have any ideas?
Upvotes: 5
Views: 5450
Reputation: 667
Use "converters" while reading excel.
Docs.
Eg.:
df = pd.read_excel('yourfile.xlsx',sheetname='sheetname',header=0,converters={ "% Chg" : str })
Upvotes: 3