xlmaster
xlmaster

Reputation: 721

pandas appending excel(xlsx) file gives attribute.error

Getting trouble with either of errors ; writer.book=book AttributeError: can't set attribute 'book' or BadZipFile

for code not giving badzipfile error I put code line which writes excel file first, dataOutput=pd.DataFrame(dictDataOutput,index=[0]) but, even though I cannot get rid of writer.book = book AttributeError: can't set attribute 'book' As one of SO answer suggests I need to bring back openpyxl to previous versions, or to work with CSV file not excel. I think that's not the solution. There should be solution which I could not get in

dataOutput=pd.DataFrame(dictDataOutput,index=[0])
dataOutput.to_excel('output.xlsx') 'output.xlsm'
book = load_workbook('output.xlsx') 'output.xlsm'
writer = pd.ExcelWriter('output.xlsx')OR'output.xlsm'#,engine='openpyxl',mode='a',if_sheet_exists='overlay')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}

for sheetname in writer.sheets:
    dataOutput.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)

writer.save()

I looked for an answer in enter link description here and in detailed solution of attributeError in enter link description here

---I tried another way

with pd.ExcelWriter('output.xlsx', mode='a',if_sheet_exists='overlay') as writer:
    dataOutput.to_excel(writer, sheet_name='Sheet1')
    writer.save()

But this time gave another error

FutureWarning: save is not part of the public API, usage can give in unexpected results and will be removed in a future version

writer.save()

after @Andrew's comments I chaned my code to this way;

with pd.ExcelWriter('outputData.xlsm', engine='openpyxl', mode='a', if_sheet_exists='overlay') as writer:
    book = load_workbook('outputData.xlsm', keep_vba=True)

    writer.book = book
    writer.sheets = {ws.title: ws for ws in book.worksheets}
    current_sheet = book['Sheet1']
    Column_A = current_sheet['A']
    maxrow = max(c.row for c in Column_A if c.value is not None)

    for sheetname in writer.sheets:
        AllDataOutput.to_excel(writer, sheet_name=sheetname, startrow=maxrow, index=False, header=False)

Upvotes: 3

Views: 5783

Answers (1)

Andrew
Andrew

Reputation: 799

There are a few things that are unclear about your question, but I'll do my best to try and answer what I think you're trying to do.

First off, things that won't work:

  1. From the to_excel docs, "If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object", so when you run dataOutput.to_excel('output.xslx'), you're creating an excel workbook with only a single sheet. Later, it seems like you're trying to write multiple sheets, but you'll only be able to write a single sheet since that's all that's in the workbook to begin with.
  2. From the ExcelWriter docs, .save() is deprecated and .close() should be used instead. But it's recommended to use a context manager (with ... as ... :) so you can entirely avoid having to do that.
  3. By default ExcelWriter uses w (write) mode, so the BadZipFile error that you were running into earlier was likely related to the order that things were run. If you had written a book with to_excel, then loaded it with openpyxl, then used ExcelWriter, you'd write, load, then immediately overwrite the file with a blank one. If you then tried loading the book again without recreating it with to_excel, you'd be trying to load an empty file, which is why that error was thrown.

The answers that you linked to in your question seem to use a much older version of pandas with much different behavior surrounding interaction with Excel files. The latest versions seem to simplify things quite a bit!

As an example solution, I'm going to assume you have two DataFrames called df1 and df2 that you'd like to write to Sheet_1 and Sheet_2 in output.xlsx, and you'd like to update the data in those sheets (using the overlay option) with additional DataFrames called df3 and df4. To confirm, I have openpyxl = 3.0.10 and pandas = 1.5.0 installed.

As you have things written currently, there aren't any updates in dataOutput that would be reflected in output.xslx, but I'm going to create a dictionary called df_dict that can hold the DataFrames and be updated based on the sheets they should be written to. I'm sure you can adjust to suit your needs using whatever data structure you prefer!

df1 = pd.DataFrame(data = {'A': ['a', 'b', 'c'], 'B': ['g','h','i']})
df2 = pd.DataFrame(data = {'C': ['o', 'p', 'q'], 'D': ['u','v','w']})

df_dict = {'Sheet_1': df1, 'Sheet_2': df2}

with pd.ExcelWriter('output.xlsx') as writer:
    for sheetname, dfname in df_dict.items():
        dfname.to_excel(writer, sheet_name = sheetname)

# Updated data is put in DataFrames
df3 = pd.DataFrame(data = {'A': ['a', 'b', 'c', 'd', 'e', 'f'], 'B': ['g', 'h', 'i', 'j', 'k', 'l']})
df4 = pd.DataFrame(data = {'C': ['o', 'p', 'q', 'r', 's', 't'], 'D': ['u', 'v', 'w', 'x', 'y', 'z']})

# df_dict is updated with the new DataFrames
df_dict = {`Sheet_1`: df3, 'Sheet_2': df4}

# This block now uses the name of the sheet from the workbook itself to get the updated data.
# You could also just run the same block as above and it would work.
with pd.ExcelWriter('output.xlsx', mode = 'a', if_sheet_exists = 'overlay') as writer:
    for sheet in writer.sheets:
        df_dict[sheet].to_excel(writer, sheet_name = sheet)

Upvotes: 7

Related Questions