bartek15
bartek15

Reputation: 7

Cannot copy large Excel file using Openpyxl

I have a 9000 row x 30 column Excel file with content I would like to copy from one Excel workbook to another. The issue I'm facing, however, is that when I run my code, it never stops running. What's more is that the memory usage keeps going up until it's used 90%-98% of my laptops available memory (screenshot).

I tried using timeit to see how slow my code was, but even that doesn't execute. I refuse to believe it's the size of my Excel file, so it must be an issue with my code. I am unable to pinpoint exactly what it could be though as I have very limited experience with openpyxl and would therefore greatly appreciate any advice. I'm using openpyxl 3.0.4 and Python 3.8.7.

def copy(source, destination):
    # Open source workbook
    wb1 = load_workbook(source)
    ws1 = wb1.worksheets[0]

    # Open destination workbook
    wb2 = load_workbook(destination)
    ws2 = wb2.active

    # Get rows and columns of source
    max_rows = ws1.max_row
    max_columns = ws1.max_column

    # Copy cells from source to destination
    for i in range(1, max_rows + 1):
        for j in range(1, max_columns + 1):
            # Read from source
            c = ws1.cell(row=i, column=j)

            # Write to destination file
            ws2.cell(row=i, column=j).value = c.value

    # Save source
    wb2.save(str(destination))

copy_src = r"C:\Users\username\Documents\copy_from.xlsx"
copy_dst = r"C:\Users\username\Documents\copy_to.xlsx"
time = timeit.timeit('%s' % copy(copy_src, copy_dst))
print("Execution time for copy(): %f seconds" % time)

Upvotes: 0

Views: 376

Answers (2)

Charlie Clark
Charlie Clark

Reputation: 19497

The worksheet is not large. If the workbook is 261 MB (please include this information in the description) then it must include a number of very large objects which openpyxl is reading into memory. For copying only read-only mode should be sufficient.

Upvotes: 1

ExistingAsMike
ExistingAsMike

Reputation: 37

How big is the Excel file? Not by row/column but by the size on disk? Whenever I've worked with openpyxl in the past I've always had issues with "empty, formatted cells", you may be able to see these more clearly by opening the excel file as XML: (How to view the XML form of an Excel file?).

Easiest way to fix is to open the source excel file (in Excel), select all blank cells and use the "Clear All" button in the tool bar or just hitting delete usually works as well.

Upvotes: 0

Related Questions