jubilee
jubilee

Reputation: 13

Python/Openpyxl: Merge empty row cells delimited by string

I am trying to create a script using python and openpyxl to open up a given excel sheet and merge all cells in a given row together until the script finds a cell containing a string. The row placement is always the same, but the number of columns and the column placement of the strings is not so it needs to be dynamic. Once a new string is found, I want to continue to merge cells until the column that is right before the grand total. There are also cases where the cell doesn't need to be merged, because there is no empty cell in the data set to merge it with.

I found this answer here, which is doing a similar procedure except it is merging rows instead of columns. I was able to refactor part of this to create a list of the cells that have strings in my workbook, but am struggling on next steps. Any thoughts?

import openpyxl
from openpyxl.utils import get_column_letter
from openpyxl import Workbook

wb1 = openpyxl.load_workbook('stackoverflow question.xlsx')
ws1 = wb1.worksheets['ws1']

columns_with_strings = []
merge_row = '3'   #the data to merge will always be in this row

for col in range (2, ws1.max_column-1):
    for row in merge_row: 
        if ws1[get_column_letter(col) + merge_row].value != None:
            columns_with_strings.append(str(get_column_letter(col) + merge_row)

The above code yields this list which includes the correct cells that contain strings and need to be checked for merging:

['C3', 'F3', 'J3']

This is how the workbook looks now: Before

This is how I am trying to get it to look in the end: After

Upvotes: 0

Views: 460

Answers (3)

Charlie Clark
Charlie Clark

Reputation: 19497

Based on @timeless' answer I've cleaned the code up a bit to make better use of Python's tools and the openpyxl API

from openpyxl import Workbook
wb = Workbook()
ws = wb.active

ws.append([])
ws.append([])
ws.append([None, None, "Group A", None, None, "Group B", None, None, None, "Group C"])

# get column indices for header cells
headings = [cell.column for cell in next(ws.iter_rows(min_row=3, max_row=3)) if cell.value]

from openpyxl.styles import Alignment, PatternFill, NamedStyle

fill = PatternFill(patternType="solid", fgColor="DDDDDD")
alignment = Alignment(horizontal="center")
header_style = NamedStyle(alignment=alignment, fill=fill, name="Header")
wb.named_styles.append(header_style)

from itertools import zip_longest
# create ranges for merged cells from the list of header cells: the boundary of the first range, is the index of the start of the next minus 1. Use zip_longest for the final value
for start_column, end_column in zip_longest(headings, headings[1:], fillvalue=headings[-1]+1):
   ws.cell(3, start_column).style = header_style
   ws.merge_cells(start_row=3, end_row=3, start_column=start_column, end_column=end_column-1)
   

wb.save("merged.xlsx")

Using the API wherever possible generally leads to more manageable and generic code.

Upvotes: 0

Timeless
Timeless

Reputation: 37737

To complete your code, you can use worksheet.merge_cells with worhseet.cell.alignment:

from openpyxl import load_workbook
from openpyxl.styles import Alignment

wb = load_workbook("tmp/stackoverflow question.xlsx")
ws = wb["Sheet1"]

merge_row = 3

#here, we get the columns idx for every non null cell in row 3
#and after that, we make a text alignment (center) in the last cell
idx_col_strings = [cell.column for cell in ws[merge_row] if cell.value]
ws.cell(3, idx_col_strings[-1]).alignment = Alignment(horizontal="center")

#here, we loop through each range until the last non null cell in row 3
#then, we make a merge as much as the number of transitions (non null => null)
#and finally, we make a text alignement (center) for each cell/merge
for i in range(len(idx_col_strings)-1):
    start_col, end_col = idx_col_strings[i], idx_col_strings[i+1]-1
    ws.merge_cells(start_row=merge_row, start_column=start_col,
                   end_row=merge_row, end_column=end_col)
    ws.cell(merge_row, start_col).alignment = Alignment(horizontal="center")
    
wb.save("tmp/stackoverflow answer.xlsx")

BEFORE :

enter image description here

AFTER :

enter image description here

Upvotes: 1

AaronTook
AaronTook

Reputation: 63

To start, if you aren't familiar with openpyxl's merge and unmerge functions, I recommend your read about them in the documentation (https://openpyxl.readthedocs.io/en/stable/usage.html#merge-unmerge-cells) to get a sense of how this works.

Here is base code that should provide the functionality you are wanting, but some values may need tweaked for your device or spreadsheet.

    import openpyxl # Necessary imports.
    from openpyxl.utils import get_column_letter
    from openpyxl.utils.cell import coordinate_from_string
    from openpyxl.utils.cell import column_index_from_string
    from openpyxl import Workbook
    
    wb1 = openpyxl.load_workbook('stackoverflow question.xlsx') # Start of your code.
    ws1 = wb1.worksheets[0]

columns_with_strings = []
merge_row = '3'   #the data to merge will always be in this row

for col in range (2, ws1.max_column):
    for row in merge_row: 
        if ws1[get_column_letter(col) + merge_row].value != None:
            columns_with_strings.append(str(get_column_letter(col) + merge_row)) # End of your code.

prior_string = columns_with_strings[0] # Set the "prior_string" to be the first detected string.
for cell in columns_with_strings:
    coords = coordinate_from_string(cell) # Split "prior_string" into the letter and number components.
    if column_index_from_string(coords[0]) >1:
        prior = str(get_column_letter(column_index_from_string(coords[0])-1)) + str(coords[1]) # Get the cell that is left of the cell "prior_string"
        if prior > prior_string:
            ws1.merge_cells(f'{prior_string}:{prior}') # Merge the cells.
            
        prior_string=cell # Set the current string to be the prior string.

ws1.merge_cells(f'{cell}:{get_column_letter(ws1.max_column)+str(coords[1])}') # Merge the last string to the end (the last column).
wb1.save("stackoverflow question.xlsx") # Save the file changes.

I hope this helps to point you in the right direction!

Upvotes: 0

Related Questions