pandas - read_excel efficiency on multiple large sheets

Question

I have an Excel workbook with multiple sheets. Some contain lots of data (f.e. 6000000 cells), and some do not. I'm attempting to read one of the sheets that's significantly smaller, a simple 2 column - 500 row sheet using the following line of code:

df = pd.read_excel('C:/Data.xlsx', sheetname='Contracts')

However, this read takes an incredible amount of time, whereas the sheet standalone in an Excel does not. Is there a reason for this?

Jay Patel · Accepted Answer

I tried to look at the API to help on how the function works for processing it but didn't come up with anything big. Few things of note:

1) assuming you are using 0.21.0 on wards you want to use sheet_name instead of sheet name

2) according to: https://realpython.com/working-with-large-excel-files-in-pandas/ the speed of pandas process directly correlates to your system ram.

3) the read_excel function opens the entire excel file and then selects the specific sheet making you load those super long sheets as well. You can test for this by just making the short sheet into a separate excel file and then running the read_excel on your new file.

Hope this helps

pandas - read_excel efficiency on multiple large sheets

Answers (1)

Related Questions