Chankey Pathak
Chankey Pathak

Reputation: 21676

chunksize keyword of read_excel is not implemented

In version 0.16.1 the chunksize argument was available.

See: http://pandas.pydata.org/pandas-docs/version/0.16.1/generated/pandas.ExcelFile.parse.html

But in latest version it's not available.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.ExcelFile.parse.html

What was the reason that it was removed?

Also, how should I process excel file by chunks in latest version?

I used to do below:

import pandas as pd

excel = pd.ExcelFile("test.xlsx")

for sheet in excel.sheet_names:
    reader = excel.parse(sheet, chunksize=1000)
    for chunk in reader:
        # process chunk

Upvotes: 0

Views: 2062

Answers (1)

Chankey Pathak
Chankey Pathak

Reputation: 21676

As EdChum explained in the comment, this feature was removed in 0.17.0. Chris gave below reason for the same in the comment:

there's no super-compelling reason; the main idea was to match up with api of to_excel, i.e. the "ExcelFileWrapper" (ExcelFile, ExcelWriter) doesn't have any pandas-specific functionality, instead you pass it into the io functions (read_excel, to_excel).

I did update the docs to cover that specific example. edit: although it may be hard to see in the diff - rendered below.

Source: https://github.com/pandas-dev/pandas/pull/11198

I still wonder if there's any alternate way to read excel in chunks?

Upvotes: 1

Related Questions