elke
elke

Reputation: 1270

Efficiently extract sheet names, and column names from large .xlsx with Python3

What are the Python3 options to efficiently (performance and memory) extract sheet names and for a given sheet, and also column names from a very large .xlsx file?

I've tried using pandas:

For sheet names using pd.ExcelFile:

    xl = pd.ExcelFile(filename)
    return xl.sheet_names

For column names using pd.ExcelFile:

    xl = pd.ExcelFile(filename)
    df = xl.parse(sheetname, nrows=2, **kwargs)
    df.columns

For column names using pd.read_excel with and without nrows (>v23):

    df = pd.read_excel(io=filename, sheet_name=sheetname, nrows=2)
    df.columns

However, both pd.ExcelFile and and pd.read_excel seem to read the entire .xlsx in memory and are therefore slow.

Thanks a lot!

Upvotes: 4

Views: 9263

Answers (4)

Jis Mathew
Jis Mathew

Reputation: 175

I think this would help the need

from openpyxl import load_workbook

workbook = load_workbook(filename, read_only=True)

data = {}   #for storing the value of sheet with their respective columns

for sheet in worksheets:
    for value in sheet.iter_rows(min_row=1, max_row=1, values_only=True):
        data[sheet.title] = value #value would be a tuple with headings of each column

Upvotes: 1

karthik bharadwaj
karthik bharadwaj

Reputation: 1

This program lists all the sheets in the excel. Pandas is used here.

import pandas as pd
with pd.ExcelFile('yourfile.xlsx') as xlsx :
    sh=xlsx.sheet_names
print("This workbook has the following sheets : ",sh)

Upvotes: 0

Jade Cacho
Jade Cacho

Reputation: 781

Here is the easiest way I can share with you:

# read the sheet file
import pandas as pd
my_sheets = pd.ExcelFile('sheet_filename.xlsx')
my_sheets.sheet_names

Upvotes: 4

Qusai Alothman
Qusai Alothman

Reputation: 2072

According to this SO question, reading excel files in chunks is not supported (see this issue on github), and using nrows will always read all the file into memory first.

Possible solutions:

  • Convert the sheet to csv, and read that in chunks.
  • Use something other than pandas. See this page for a list of alternative libraries.

Upvotes: 1

Related Questions