Reputation: 2287
I have one excel file with many sheets. There is only one column in every sheet, which is column A. I plan to read the excel file with read_excel()
method. Hier is the code:
import pandas as PD
ExcelFile = "C:\\AAA.xlsx"
SheetNames = ['0', '1', 'S', 'B', 'U']
# There are five sheets in this excel file. Those are the sheet names.
PageTotal = len(SheetNames)
for Page in range(PageTotal):
df = PD.read_excel(ExcelFile, header=None, squeeze = True, parse_cols = "A" ,sheetname=str(SheetNames[Page]))
print df
#do something with df
The problem is, the for loop
runs only once. By running the second item in the for loop
it shows me the following error text:
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 227, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27\lib\site-packages\xlrd\xlsx.py", line 824, in open_workbook_2007_xml
x12sst.process_stream(zflo, 'SST')
File "C:\Python27\lib\site-packages\xlrd\xlsx.py", line 432, in process_stream_iterparse
for event, elem in ET.iterparse(stream):
File "<string>", line 103, in next
IndexError: pop from empty stack
As a beginner I have no idea about this error. Could anybody please help me to correct the codes? Thanks.
UPDATE Question:If it is because that the excel file contains many formulars and external links, why the for loop
could still run its first item? Confused.
Upvotes: 1
Views: 7730
Reputation: 795
You can simply use:
df = pd.read_excel("C:\\AAA.xlsx", sheet_name=None)
for key, value in df.items():
................
When you set 'sheet_name=None', pandas will automatically read all excel sheets from your workbook. And for iterating over sheets and it's content you can iterate over 'df.items()' and do whatever manipulation you'll have to do. In this above code 'key' is the sheets name and 'value' is the content inside sheet. There is no need to create extra list object, in your case 'sheet_names'. Hope it will solve your issue.
Upvotes: 0
Reputation: 8917
Why are you using sheetname=str(SheetNames[Page])
?
If I understand your question properly I think what you want is:
import pandas as PD
excel_file = r"C:\\AAA.xlsx"
sheet_names = ['0', '1', 'S', 'B', 'U']
for sheet_name in sheet_names:
df = pd.read_excel(excel_file, header=None, squeeze=True, parse_cols="A", sheetname=sheet_name)
print(df)
#do something with df
Upvotes: 1
Reputation: 183
Referring to the answer here: Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
Perhaps you can try this:
import pandas as pd
xls = pd.ExcelFile("C:\\AAA.xlsx")
dfs = []
for x in ['0', '1', 'S', 'B', 'U'] :
dfs.append(xls.parse(x))
Or this as a dict instead of list so you can easily get a particular sheet out to work with
import pandas as pd
xls = pd.ExcelFile("C:\\AAA.xlsx")
dfs = {}
for x in ['0', '1', 'S', 'B', 'U'] :
dfs[x] = xls.parse(x)
Upvotes: 0