Reputation: 305
I'm using pandas to read .xls files and extract tables into df.(I can open it with Excel, but it gives me a pop up: .xls file cannot be accessed. The file may be corrupted, located on a server that is not responding, or read-only exception. ).
In general properties its Microsoft Excel 97-2003 Worksheet (.xls)
Code:
import os, sys
import pandas as pd
from os import walk
file_path = os.path.dirname(os.path.abspath(__file__))
excels = [pd.read_excel(name) for name in file_path]
df = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels] #Error
df.to_excel("Final.xls", header=False, index=False)
Error:
pd.ExcelFile(name) :
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xc1\xc5 \t\xc7\xed\xcf'
or (with rea_html)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\html.py", line 545, in _parse_tables
raise ValueError("No tables found")
ValueError: No tables found
However as the error message says, the first 8 bytes of the file are '\xc1\xc5' ... that is definitely not Excel .xls format...
Is it any way to proceed such files?
Upvotes: 0
Views: 659
Reputation: 9
Although I am new to these Pandas things; the first thing I realize is there is a syntax error down below. It should have been "pd.read_excel".
excels = [pd.read_exel(name) for name in file_path]
The second thing I can say is; corrupted xls files could be read by "pd.read_html()". I hope it helps.
Upvotes: 0