Jordan
Jordan

Reputation: 23

Pandas and xlrd error while reading excel files

I've been working on a Python script that deals with creating Pandas data frames from Excel files. For the past few days, the Pandas method worked perfectly with the usual pd.read_excel() method.

Today I've been trying to run the same code, but am running into errors. I've tried using the following code on a small test document (just two columns, 5 rows with simple integers):

import pandas as pd

pd.read_excel("tstr.xlsx")

I'm getting this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
    super().__init__(filepath_or_buffer)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
    self.book = self.load_workbook(filepath_or_buffer)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
    return open_workbook(filepath_or_buffer)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\__init__.py", line 130, in open_workbook
    bk = xlsx.open_workbook_2007_xml(
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\xlsx.py", line 812, in open_workbook_2007_xml
    x12book.process_stream(zflo, 'Workbook')
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\xlsx.py", line 266, in process_stream
    for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
AttributeError: 'ElementTree' object has no attribute 'getiterator'

I get the exact same issue when trying to load excel files with xlrd directly. I've tried with several different excel files, and all of my pip installations are up-to-date.

I haven't made any changes to my system since pd.read_excel was last working perfectly (I did reboot my system, but it didn't involve any updates). I'm using a Windows 10 machine, if that's relevant.

Has anyone else had this issue? Any advice on how to proceed?

Upvotes: 2

Views: 7345

Answers (3)

Surasak Sangkhathat
Surasak Sangkhathat

Reputation: 11

This works for me

#Back to linux prompt and install openpyxl

pip install openpyxl

#Add engine='openpyxl' in the python argument

data = pd.read_excel(path, sheet_name='Sheet1', parse_dates=True, engine='openpyxl')

Upvotes: 1

T. Philip
T. Philip

Reputation: 11

openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.

So for me the argument:

  • engine="xlrd" worked on .xls
  • engine="openpyxl" worked on .xlsx

Upvotes: 1

Roim
Roim

Reputation: 3066

There can be many different reasons that cause this error, but you should try add engine='xlrd' or other possible values (mostly "openpyxl"). It may solve your issue, as it depends more on the excel file rather then your code.

Also, try to add full path to the file instead of relative one.

Upvotes: 4

Related Questions