Reputation: 41509
I need to read an excel file using load_workbook
from openpyxl
. Some cells are merged in this file. This will return NaN values to some of the merged cells. I would like to have all merged cells the same value of the file. Here I tried to create a reproducible example from this simple excel file:
Code:
import pandas as pd
import numpy as np
from openpyxl import load_workbook
workbook = load_workbook("example.xlsx", data_only=True)
sheet = workbook.active
# Create dataframe
max_row = sheet.max_row
max_col = sheet.max_column
df = pd.DataFrame(np.nan, index=range(1, max_row + 1), columns=range(1, max_col + 1))
# Iterate through all cells in the sheet
for row in sheet.iter_rows():
for cell in row:
df.at[cell.row, cell.column] = cell.value
print(df)
Output:
1 2 3 4 5
1 merged NaN long_merged NaN NaN
2 1 4.0 A NaN NaN
3 2 5.0 B NaN NaN
4 3 NaN None NaN NaN
As you can see it returns some NaN to the merged cells. My expected output should be like this:
1 2 3 4 5
1 merged merged long_merged long_merged long_merged
2 1 4.0 A NaN NaN
3 2 5.0 B NaN NaN
4 3 NaN None NaN NaN
So I was wondering how we can return the same value to all cells of the merged cell as in the expected output?
Upvotes: 0
Views: 47
Reputation: 106
Try adding the following code between sheet = workbook.active
and # Create dataframe
.
for r in list(sheet.merged_cells.ranges):
cl,rl,ch,rh = r.bounds
sheet.unmerge_cells(str(r))
for i in range(cl, ch + 1):
sheet.cell(row = rl, column = i).value = sheet.cell(rl, cl).value
sheet.merged_cells.ranges
finds all ranges of merged cells on sheet
. Then r
iterates through the found ranges. The use of list
prevents unmerge_cells()
(see below) from causing an error.
cl,rl,ch,rh = r.bounds
assigns column low, row low, column high and row high of the merged range r
to cl
, rl
, ch
, rh
, respectively. Excel stores the value displayed in the merged cells in cell rl:cl
of the range. This is used later.
It is impossible to assign values to merged cells. They must be unmerged first. sheet.unmerge_cells(str(r))
unmerges the merged range r
. unmerge_cells()
takes string
as its argument, therefore r
is typecast to str
.
The i
cycle iterates through the columns of the unmerged range from column low (cl
) to column high (ch
). The last line assigns value in cell rl:cl
to all cells in the columns.
If the merged cells ranges include more than one row, the code can be modified to add another cycle to iterate through rows:
for j in range(rl, rh + 1):
for i in range(cl, ch + 1):
sheet.cell(row = j, column = i).value = sheet.cell(rl, cl).value
Upvotes: 1