Reputation: 15
I've got an excel sheet that has values such as:
ProductID | Reference | Component used | DESCRIPTION |
---|---|---|---|
5325646 | PD-42#23 | ASDGE34#W-AL | HOUSING PLATE |
4563756 | ITA-COPPER | RAW AL | WASHER |
4345432 | ASDGE34#W-AL | ZK23456 | 3MM X 2MM X 10MM COPPER BAR |
1245456 | ASDGE34#W-AL | ZK66634 | 1MM X 2MM X 9MM COPPER BAR |
What I essentially want to do is match anything in "Component" with "Reference" and if they come true, like in this example, ASDGE34#W-AL matches in "Component" and in "Reference", so I'd want to copy the details over and shift it after "Description". The end result that I want to achieve is this.
ProductID | Reference | Component used | DESCRIPTION | Child Component ID | DESCRIPTION |
---|---|---|---|---|---|
5325646 | PD-42#23 | ASDGE34#W-AL | HOUSING PLATE | ZK23456 & ZK654534 | 3MM X 2MM X 10MM COPPER BAR &1MM X 2MM X 9MM COPPER BAR |
4563756 | ITA-COPPER | ZAAGD0545 | ALUMINIUM BAR | ||
5656565 | 0788866-7#D | YN435611 | DUSTER |
Can anyone help with this? I've been using Python to try and iterate a loop over the "Components used" and the "Reference" column seeing what match but no luck so far.
Upvotes: 0
Views: 1112
Reputation: 4860
I have code that does something similar. I've changed it to suit your problem.
The products are stored in 2 separate dict
list
s, accessed by Reference and Component used. You can easily iterate over the products_by_reference
dict using a Component used value to get a list of products with a matching reference.
Note that this implementation assumes each column header is unique, as each row is stored as a dict where each column header is used as the key to access the corresponding value, so if a header is used twice, the first value will be overwritten. Because of this, I manually rename the header Child DESCRIPTION to DESCRIPTION after writing the header columns.
import os
from collections import defaultdict
import openpyxl
# Load the existing spreadsheet.
wb = openpyxl.load_workbook(filename="Products.xlsx")
ws = wb.worksheets[0]
# Style used for the header row
bold_font = openpyxl.styles.Font(bold=True)
grey_fill = openpyxl.styles.fills.PatternFill("solid", fgColor="F8F9F9")
# Store the header column names and numbers.
# Key = column name, value = column number. e.g. "ProductID": 1
header_columns = {}
for column_num in range(1, ws.max_column + 1):
column_name = ws.cell(row=1, column=column_num).value
header_columns[column_name] = column_num
# Read in all the values, starting after the header row.
# Store each product as a dict, and store every product in 2 separate
# dict of lists, with the keys being the reference and component used.
products_by_reference = defaultdict(list)
products_by_component = defaultdict(list)
for row_num in range(2, ws.max_row + 1):
product = {}
for column_name, column_num in header_columns.items():
product[column_name] = ws.cell(row=row_num, column=column_num).value
reference = product["Reference"]
component = product["Component used"]
products_by_reference[reference].append(product)
products_by_component[component].append(product)
# Iterate over the products by component, and copy and join all the
# components used and descriptions to the relevant product's dict.
for component, products in products_by_component.items():
# Use dicts here instead of sets if preserving the order is important.
child_components = set()
child_descriptions = set()
for product in products_by_reference[component]:
child_components.add(product["Component used"])
child_descriptions.add(product["DESCRIPTION"])
child_components_string = " & ".join(child_components)
child_descriptions_string = " & ".join(child_descriptions)
for product in products:
product["Child Component ID"] = child_components_string
product["Child DESCRIPTION"] = child_descriptions_string
# products_by_reference will be used later for the new spreadsheet, and
# we don't want to include any products that have a reference which is a
# component used by another product, since they have now been merged with
# that other product.
del products_by_reference[component]
# Write the products to a new spreadsheet.
wb = openpyxl.Workbook()
ws = wb.active
ws.title = "Products"
# Freeze the header row.
ws.freeze_panes = "A2"
# Add the extra headers.
column_num = len(header_columns)
header_columns["Child Component ID"] = column_num + 1
header_columns["Child DESCRIPTION"] = column_num + 2
# Write the headers.
for column_num, column_name in enumerate(header_columns, start=1):
cell = ws.cell(row=1, column=column_num, value=column_name)
cell.font = bold_font
cell.fill = grey_fill
# This implementation doesn't allow there to be duplicate headers, as the
# headers are used as keys to retrieve values from the product dicts. So the
# the last header is manually changed from "Child DESCRIPTION" to
# "DESCRIPTION" after it has been written to the sheet.
cell.value = "DESCRIPTION"
# Flatten the dictionary of lists, and write every product to the spreadsheet.
products = [product for products_list in products_by_reference.values() for product in products_list]
for row_num, product in enumerate(products, start=2):
for column_name, column_num in header_columns.items():
cell = ws.cell(row=row_num, column=column_num, value=product.get(column_name))
# Save the spreadsheet to file, then launch it.
filename = "Merged products.xlsx"
wb.save(filename)
os.startfile(filename)
Upvotes: 1