Amir Shabani
Amir Shabani

Reputation: 4197

Python - Modifying existing excel using Pandas and openpyxl

I have an Excel file (Celebrities.xlsx) with multiple sheets and I'm trying to modify a single sheet called Relationships without modifying (or potentially erasing) other sheets. Here's what I've done.

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

# Name of the celebrity that I want to modify
celeb_name = 'Terence Stamp'

wb = load_workbook('Celebrities.xlsx')
ws = wb['Relationships']

df = pd.read_excel('Celebrities.xlsx', sheet_name='Relationships')

# This part is trivial, but basically I'm replacing every null cell in 'Link' column with the word 'empty' (of that particular celebrity)
df.loc[(df['Celebrity Name'] == celeb_name) & (df['Link'].isnull()), 'Link'] = 'empty'

for r in dataframe_to_rows(df, index=True, header=True):
    ws.append(r)

wb.save('new.xlsx')

Now the script runs without any error and new.xlsx is created successfully, but when I try to open it, it gives me this error:

Warning loading document new.xlsx: The data could not be loaded completely because the maximum number of rows per sheet was exceeded.

And nothing has been modified!

I can assure that this part of the code works perfectly:

wb = load_workbook('Celebrities.xlsx')
ws = wb['Relationships']
wb.save('new.xlsx')

I suppose the problem is with this part of code:

for r in dataframe_to_rows(df, index=True, header=True):
    ws.append(r)

But I don't know how to fix it.

Upvotes: 1

Views: 1905

Answers (1)

BoarGules
BoarGules

Reputation: 16942

You say in your question nothing has been modified. But it has. Your code is looping through the dataframe and adding a new row to the worksheet each time through the loop. The limit is 1,048,576 rows and Excel is telling you that the modified worksheet exceeds that limit.

Upvotes: 3

Related Questions