Quinten
Quinten

Reputation: 41509

Merged cells return NaN in load_workbook openpyxl

I need to read an excel file using load_workbook from openpyxl. Some cells are merged in this file. This will return NaN values to some of the merged cells. I would like to have all merged cells the same value of the file. Here I tried to create a reproducible example from this simple excel file:

enter image description here

Code:

import pandas as pd
import numpy as np
from openpyxl import load_workbook

workbook = load_workbook("example.xlsx", data_only=True)
sheet = workbook.active

# Create dataframe
max_row = sheet.max_row
max_col = sheet.max_column
df = pd.DataFrame(np.nan, index=range(1, max_row + 1), columns=range(1, max_col + 1))

# Iterate through all cells in the sheet
for row in sheet.iter_rows():
    for cell in row:
        df.at[cell.row, cell.column] = cell.value

print(df)

Output:

        1    2            3   4   5
1  merged  NaN  long_merged NaN NaN
2       1  4.0            A NaN NaN
3       2  5.0            B NaN NaN
4       3  NaN         None NaN NaN

As you can see it returns some NaN to the merged cells. My expected output should be like this:

        1       2            3           4           5
1  merged  merged  long_merged long_merged long_merged
2       1     4.0            A         NaN         NaN
3       2     5.0            B         NaN         NaN
4       3     NaN         None         NaN         NaN

So I was wondering how we can return the same value to all cells of the merged cell as in the expected output?

Upvotes: 0

Views: 47

Answers (1)

Man made of meat
Man made of meat

Reputation: 106

Try adding the following code between sheet = workbook.active and # Create dataframe.

for r in list(sheet.merged_cells.ranges):
    cl,rl,ch,rh = r.bounds
    sheet.unmerge_cells(str(r))
    for i in range(cl, ch + 1):
        sheet.cell(row = rl, column = i).value = sheet.cell(rl, cl).value

sheet.merged_cells.ranges finds all ranges of merged cells on sheet. Then r iterates through the found ranges. The use of list prevents unmerge_cells() (see below) from causing an error.

cl,rl,ch,rh = r.bounds assigns column low, row low, column high and row high of the merged range r to cl, rl, ch, rh, respectively. Excel stores the value displayed in the merged cells in cell rl:cl of the range. This is used later.

It is impossible to assign values to merged cells. They must be unmerged first. sheet.unmerge_cells(str(r)) unmerges the merged range r. unmerge_cells() takes string as its argument, therefore r is typecast to str.

The i cycle iterates through the columns of the unmerged range from column low (cl) to column high (ch). The last line assigns value in cell rl:cl to all cells in the columns.

If the merged cells ranges include more than one row, the code can be modified to add another cycle to iterate through rows:

for j in range(rl, rh + 1):
    for i in range(cl, ch + 1):
        sheet.cell(row = j, column = i).value = sheet.cell(rl, cl).value

Upvotes: 1

Related Questions