Reputation: 63
I have two excel files, master and child. All the column names are same in both the files.
I want to match column C of both the files, if there is a match then update all the columns for that specific row and if there is no match append the row at the end of the Master file.
I want to update data from child file to Master file based on this logic. So far I am able to update the Master file by copying all the data from Child to Master, but the data in Master file is getting replaced from Child file for that specified range. Any help will be much appreciated.
import openpyxl
Master = openpyxl.load_workbook("Master.xlsx")
Child = openpyxl.load_workbook("Child.xlsx")
Master_File = Master["Sheet1"]
Child_File = Child["Sheet1"]
Function to copy rows and columns from Child File
def copyRange(startCol, startRow, endCol, endRow, sheet):
rangeSelected = []
#Loop through selected rows------------------------------------------------
for i in range(startRow, endRow + 1, 1):
#Appends to row selected list--------------------------------------------
rowSelected = []
for j in range(startCol, endCol + 1, 1):
rowSelected.append(sheet.cell(row=i, column=j).value)
#Adds the rowSelected list and nests inside rangesSelected list------------------------------
rangeSelected.append(rowSelected)
return rangeSelected
Function to paste all the data to Master File
def pasteRange(startCol, startRow, endCol, endRow, sheetReceiving, copiedData):
countRow = 0
for i in range(startRow, endRow + 1, 1):
countCol = 0
for j in range(startCol, endCol + 1, 1):
sheetReceiving.cell(row=i, column=j).value = copiedData[countRow][countCol]
countCol += 1
countRow += 1
Main function
def createData():
wb = Workbook()
print("Your data is being Processed.....")
selectedRange = copyRange(1,10,39,45, Child_File)
pastingRange = pasteRange(1,10,39,45, Master_File, selectedRange)
Master.save(r"Final.xlsx")
print("Range copied and pasted")
Final = createData()
Upvotes: 0
Views: 237
Reputation: 16172
This demonstrates how you can take two dataframes (which could both be created from .read_excel() in pandas), set your index column as the column you want to match on, and update the original with matches from the second dataframe, then write to xlsx again.
import pandas as pd
#df = pd.read_excel('myfile1.xlsx')
df = pd.DataFrame({'C': [1, 2, 3],
'D': [400, 500, 600]})
#new_df = pd.read_excel('myfile2.xlsx')
new_df = pd.DataFrame({'C': [1, 2, 6],
'D': [7, 8, 9]})
df.set_index('C', inplace=True)
df.update(new_df.set_index('C'))
df.update(new_df)
df.reset_index().to_excel('updated.xlsx', index=False)
Output
C D
0 1 8.0
1 2 9.0
2 3 600.0
Upvotes: 0