干猕猴桃
干猕猴桃

Reputation: 305

Pandas: parsing corrupted .xls file

I'm using pandas to read .xls files and extract tables into df.(I can open it with Excel, but it gives me a pop up: .xls file cannot be accessed. The file may be corrupted, located on a server that is not responding, or read-only exception. ).

In general properties its Microsoft Excel 97-2003 Worksheet (.xls)

Code:

import os, sys
import pandas as pd
from os import walk


file_path = os.path.dirname(os.path.abspath(__file__)) 

excels = [pd.read_excel(name) for name in file_path]  

df = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels] #Error

df.to_excel("Final.xls", header=False, index=False)

Error:

pd.ExcelFile(name) :

    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xc1\xc5  \t\xc7\xed\xcf'

or  (with rea_html)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\html.py", line 545, in _parse_tables
    raise ValueError("No tables found")
ValueError: No tables found

However as the error message says, the first 8 bytes of the file are '\xc1\xc5' ... that is definitely not Excel .xls format...

Is it any way to proceed such files?

Upvotes: 0

Views: 659

Answers (1)

OYTUN ORAL
OYTUN ORAL

Reputation: 9

Although I am new to these Pandas things; the first thing I realize is there is a syntax error down below. It should have been "pd.read_excel".

excels = [pd.read_exel(name) for name in file_path]

The second thing I can say is; corrupted xls files could be read by "pd.read_html()". I hope it helps.

Upvotes: 0

Related Questions