Basj
Basj

Reputation: 46381

Modify an Excel file with Pandas, with minimal change of the layout

I've already read Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file? but here my question is specific to the layout mentioned hereafter.

How to open an Excel file with Pandas, do some modifications, and save it back:

?

Here is what I tried, it's a short example (in reality I do more processing with Pandas):

import pandas as pd

df = pd.read_excel('in.xlsx')
df['AB'] = df['A'].astype(str) + ' ' + df['B'].astype(str)  # create a new column from 2 others
del df['Date']                                              # delete columns
del df['Time']
df.to_excel('out.xlsx', index=False)

With this code, the Filter of the first row is removed and the displayed column width are set to a default, which is not very handy (because we would have to manually set the correct width for all columns).

Upvotes: 5

Views: 7933

Answers (4)

Umar.H
Umar.H

Reputation: 23099

If you are using a machine that has Excel installed on it, then I highly recommend using the flexible xlwings API. This answers all your questions.

Let's assume I have an Excel file called demo.xlxs in the same directory as my program.

enter image description here

app.py

import xlwings as xw # pip install xlwings
import pandas as pd

wb = xw.Book('demo.xlsx')

enter image description here

This will create a initiate an xl workbook instance and open your Excel editor to allow you to invoke Python commands.

enter image description here


Let's assume we have the following dataframe that we want to use to replace the ID and Name column:

    new_name
A   John_new
B  Adams_new
C     Mo_new
D  Safia_new

wb.sheets['Sheet1']['A1:B1'].value = df

enter image description here


Finally, you can save and close.

wb.save()
wb.close()

Upvotes: 5

Basj
Basj

Reputation: 46381

This is a solution for (1), (2), but not (3) from my original question. (If you have an idea for (3), a comment and/or another answer is welcome).

In this solution, we open the input Excel file two times:

  • once with openpyxl: this is useful to keep the original layout (which seems totally discarded when reading as a pandas dataframe!)
  • once as a pandas dataframe df to benefit from pandas' great API to manipulate/modify the data itself. Note: data modification is handier with pandas than with openpyxl because we have vectorization, filtering df[df['foo'] == 'bar'], direct access to the columns by name df['foo'], etc.

The following code modifies the input file and keeps the layout: the first row "Filter" is not removed and the column width of each colum is not modified.

import pandas as pd
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import load_workbook
wb = load_workbook('test.xlsx')  # load as openpyxl workbook; useful to keep the original layout
                                 # which is discarded in the following dataframe
df = pd.read_excel('test.xlsx')  # load as dataframe (modifications will be easier with pandas API!)
ws = wb.active
df.iloc[1, 1] = 'hello world'    # modify a few things
rows = dataframe_to_rows(df, index=False)
for r_idx, row in enumerate(rows, 1):
    for c_idx, value in enumerate(row, 1):
        ws.cell(row=r_idx, column=c_idx, value=value)
wb.save('test2.xlsx')

Upvotes: 2

eliu
eliu

Reputation: 2479

I would recommend xlwings, as it interfaces with excel's COM interfaces (like built-in vba), so it is more powerful. I never tested the "preservation of filtering or formula", official doc may provide ways.

For my own use, I just build everything into python, filtering, formulas, so I don't even touch the excel sheet.

Demo:

# [step 0] boiler plate stuff
df = pd.DataFrame(
    index=pd.date_range("2020-01-01 11:11:11", periods=100, freq="min"), 
    columns=list('abc'))
df['a'] = np.random.randn(100, 1) 
df['b'] = df['a'] * 2 + 10

# [step 1] google xlwings, and pip/conda install xlwings

# [step 2] open a new excel sheet, no need to save 
#        (basically this code will indiscriminally wipe whatever sheet that is active on your desktop)

# [step 3] magic, ...and things you can do
import xlwings as xw

wb = xw.books.active
ws = wb.sheets.active
ws.range('A1').current_region.options(index=1).value = df 
# I believe this preserves existing formatting, HOWEVER, it will destory filtering

if 1:
# show casing some formatting you can do
    active_window = wb.app.api.ActiveWindow
    active_window.FreezePanes = False
    active_window.SplitColumn = 2 # const_splitcolumn
    active_window.SplitRow = 1
    active_window.FreezePanes = True


    ws.cells.api.Font.Name = 'consolas'
    ws.api.Rows(1).Orientation = 60
    ws.api.Columns(1).Font.Bold = True
    ws.api.Columns(1).Font.ColorIndex = 26
    ws.api.Rows(1).Font.Bold = True
    ws.api.Rows(1).Borders.Weight = 4
    ws.autofit('c') # 'c' means columns, autofitting columns
    ws.range(1,1).api.AutoFilter(1)

Upvotes: 2

Glauco
Glauco

Reputation: 1465

I think this is not field of pandas, you must use openpyxl in order to take care of all formatting, blocked_rows, name ranges and so on. Main difference is that you cannot use vectorial computation as in pandas so you need to introduce some loop.

Upvotes: 0

Related Questions