AhmedWas
AhmedWas

Reputation: 1277

How to check if an xlsx file is valid?

I'm using openpyxl to deal with Excel sheets. It works fine, but then I encountered a file that gives me the following error:

Traceback (most recent call last):
    File "/home/ute/OM/Python_Scripts/preparePlanFileFromExcelReport.py", line 13, in <module>
    wb = load_workbook(differenceReportFile)
    File "/usr/local/lib/python2.7/dist-packages/openpyxl/reader/excel.py", line 151, in load_workbook
    archive = _validate_archive(filename)
    File "/usr/local/lib/python2.7/dist-packages/openpyxl/reader/excel.py", line 118, in _validate_archive
    archive = ZipFile(f, 'r', ZIP_DEFLATED)
    File "/usr/lib/python2.7/zipfile.py", line 714, in __init__
    self._GetContents()
    File "/usr/lib/python2.7/zipfile.py", line 748, in _GetContents
    self._RealGetContents()
    File "/usr/lib/python2.7/zipfile.py", line 763, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
    zipfile.BadZipfile: File is not a zip file

After some search, I found this error pops if your file is not a valid xlsx file.

I can open the file normally with MS Excel 2013, but how can I tell if this file is a valid xlsx file?

Upvotes: 1

Views: 5374

Answers (2)

John Y
John Y

Reputation: 14529

Your question is kind of self-answering: Your error message already tells you that (1) OpenPyXL cannot open the file, and (2) the reason is that the file isn't a valid zip file (and thus not a valid .xlsx file).

If for some reason you need the program to continue even though the file is invalid, you can use the usual try..except mechanism:

import openpyxl
from zipfile import BadZipfile

try:
    wb = load_workbook(differenceReportFile)
except BadZipfile:
    print 'Invalid zip file.'
# continue processing here

If you want to handle the possibility that the .xlsx file is really a .xls file, but simply misnamed, then you can use xlrd to read the file instead (it handles both .xls and .xlsx).

If you want to be able to read ANY file that Excel can read (regardless of the file extension), your only realistic choice is to have Excel itself open the file, which you can do using the COM interface (PyWin32, pywinauto, xlwings, etc.).

Upvotes: 0

Charlie Clark
Charlie Clark

Reputation: 19507

If it really isn't a zip file then it really isn't an Excel file as this is part of the specification. However, Excel will treat some files that are not actually Excel files as if they were. Some libraries use this for example to export a special kind of HTML that Excel can read.

If you think that the file is correct and that the problem is with openpyxl then please submit a bug report together with a sample file.

Upvotes: 1

Related Questions