Reputation: 357
I'm trying to read an xlsx file into python using pandas.
I've done this thousands of times before but some reason it is not working with a particular file.
The file is downloaded from another source and I get an AssertionError (see end) when reading with pandas:
df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1")
The variable is defined for the path. The path exists (os.path.exists(path) returns True).
When I copy the contents of the file and paste the values in a new excel doc, this new one will open with the read_excel() method.
When I copy the contents of the file and paste the formatting in a new excel, this new one will open with the read_excel() method.
It doesn't seem to be the values or the formatting.
I am guessing this could be an encoding issue?
Thank you for any help.
df1 = pandas.read_excel(snap1)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 206, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 794, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 531, in own_process_stream
self_do_row(elem)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 597, in do_row
assert 0 <= self.rowx < X12_MAX_ROWS
AssertionError
Upvotes: 3
Views: 16184
Reputation: 96
Look at your system for the file xlsx.py.
In your computer it's apparently at C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py
Search the line :
X12_MAX_ROWS = 2 ** 20
and change it so something like
X12_MAX_ROWS = 2 ** 22
This will push the limit of the number of lines from 1 million lines to 4 million lines.
Upvotes: 3
Reputation: 36
Sometimes this can be resolved just by deleting the (blank) lines below your table in Excel.
Upvotes: 0
Reputation: 357
The file contained Korean characters in the text. These needed alternative encoding. Using the "encoding" parameter in the read_excel() method resolved the issue.
df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1", encoding="utf-16")
Upvotes: 0
Reputation: 1
Have encountered the same problem, I save file under xml format: "Save as type: XML Spreadsheet 2003" on window. Then I open the file and save as xlsx format. The new file no longer gives error message.
Upvotes: 0
Reputation: 31
In my case, I was using xlrd package to read excel and I got the same error of Assertion. Open your xlrd package from site-packages, and from that open sheet.py (https://github.com/python-excel/xlrd/blob/master/xlrd/sheet.py)
Find this code in sheet.py
if self.biff_version >= 80:
self.utter_max_rows = 65536
else:
self.utter_max_rows = 16384
Convert the above one into...
#if self.biff_version >= 80:
self.utter_max_rows = 65536
#else:
# self.utter_max_rows = 16384
Now try to run your program... Problem will be solved..:)
Upvotes: 2
Reputation: 7058
Just for completeness, I had a similar problem where the row number of the first row was incorrect, I fixed my problem by changing xlsx-file with code adapted from this answer
def repair_broken_excelfile(zipfname, *filenames, new_name=None):
# https://stackoverflow.com/a/4653863/1562285
import tempfile
import zipfile
import shutil
import os
tempdir = tempfile.mkdtemp()
try:
tempname = os.path.join(tempdir, 'new.zip')
with zipfile.ZipFile(zipfname, 'r') as zipread:
with zipfile.ZipFile(tempname, 'w') as zipwrite:
for item in zipread.infolist():
print('fn: ' + item.filename)
if item.filename not in filenames:
data = zipread.read(item.filename)
zipwrite.writestr(item, data)
else:
data = zipread.read(item.filename)
data = data.replace(b'<row r="0" spans="">', b'<row r="1" spans="">')
zipwrite.writestr(item, data)
pass
if not new_name:
new_name = zipfname
shutil.move(tempname, new_name)
finally:
shutil.rmtree(tempdir)
Apparently there is a fix underway in xlrd
Upvotes: 0