StupendousEnzio
StupendousEnzio

Reputation: 63

Match unique column from two .xlsx files, if match update else append using openpyxl

I have two excel files, master and child. All the column names are same in both the files.

I want to match column C of both the files, if there is a match then update all the columns for that specific row and if there is no match append the row at the end of the Master file.

I want to update data from child file to Master file based on this logic. So far I am able to update the Master file by copying all the data from Child to Master, but the data in Master file is getting replaced from Child file for that specified range. Any help will be much appreciated.

import openpyxl 

Master = openpyxl.load_workbook("Master.xlsx")
Child = openpyxl.load_workbook("Child.xlsx")

Master_File = Master["Sheet1"]
Child_File = Child["Sheet1"]

Function to copy rows and columns from Child File

def copyRange(startCol, startRow, endCol, endRow, sheet):
    rangeSelected = []
    #Loop through selected rows------------------------------------------------
    for i in range(startRow, endRow + 1, 1):
        #Appends to row selected list--------------------------------------------
        rowSelected = []
        for j in range(startCol, endCol + 1, 1):
            rowSelected.append(sheet.cell(row=i, column=j).value)
        #Adds the rowSelected list and nests inside rangesSelected list------------------------------
        rangeSelected.append(rowSelected) 
    return rangeSelected

Function to paste all the data to Master File

def pasteRange(startCol, startRow, endCol, endRow, sheetReceiving, copiedData):
    countRow = 0
    for i in range(startRow, endRow + 1, 1):
        countCol = 0
        for j in range(startCol, endCol + 1, 1):

            sheetReceiving.cell(row=i, column=j).value = copiedData[countRow][countCol]
            countCol += 1
        countRow += 1

Main function

def createData():
    wb = Workbook()
    print("Your data is being Processed.....")
    selectedRange = copyRange(1,10,39,45, Child_File)
    pastingRange = pasteRange(1,10,39,45, Master_File, selectedRange)
    Master.save(r"Final.xlsx")
    print("Range copied and pasted")





Final = createData()

Upvotes: 0

Views: 237

Answers (1)

Chris
Chris

Reputation: 16172

This demonstrates how you can take two dataframes (which could both be created from .read_excel() in pandas), set your index column as the column you want to match on, and update the original with matches from the second dataframe, then write to xlsx again.

import pandas as pd
#df = pd.read_excel('myfile1.xlsx')
df = pd.DataFrame({'C': [1, 2, 3],
                   'D': [400, 500, 600]})
#new_df = pd.read_excel('myfile2.xlsx')
new_df = pd.DataFrame({'C': [1, 2, 6],
                       'D': [7, 8, 9]})

df.set_index('C', inplace=True)
df.update(new_df.set_index('C'))

df.update(new_df)

df.reset_index().to_excel('updated.xlsx', index=False)

Output

    C   D
0   1   8.0
1   2   9.0
2   3   600.0

Upvotes: 0

Related Questions