bdlogin
bdlogin

Reputation: 23

XLRDError: Unsupported format, or corrupt file

Getting error while reading .xlsx files using pandas. It looks like it is opening the file as it is able to read first 8 char of column name that is employee id but failing with this error. I see a lot of post about this but the last part is never a column name in those error messages. Any suggestions?

In dev environment, when I opened this file as excel and reloaded into server, it worked.

Error: XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'Employee'

Upvotes: 2

Views: 1360

Answers (1)

pallupz
pallupz

Reputation: 883

As noted in the release email, linked to from the release tweet and noted in large orange warning that appears on the front page of the documentation, and less orange but still present in the readme on the repo and the release on pypi:

xlrd has explicitly removed support for anything other than xls files.

This is due to potential security vulnerabilities relating to the use of xlrd version 1.2 or earlier for reading .xlsx files.

Solutions available:

  • specify older xlrd version i.e. xlrd==1.2.0 OR
  • Use openpyxl on pandas:

Make sure you are on a recent version of pandas, at least 1.0.1, and preferably the latest release. Install openpyxl: https://openpyxl.readthedocs.io/en/stable/ change your pandas code to be: 

pandas.read_excel('cat.xlsx', engine='openpyxl')

The next pandas release, pandas 1.2, will do this by default.

Upvotes: 2

Related Questions