Reputation: 183
[Someone suggested that this is a duplicate of Pandas Merging 101. I disagree. That page talks only about merging two existing pandas. It does not explain how to do a comparison between excel and an existing panda. Even if i did a join with two dataframes, it appears that that would require even more work. I was trying to find a way to reduce how much code I need to write.]
i'm extremely new to Pandas and Dataframes, so i'm hoping I could get some suggestions on how to improve a script I wrote.
I'm trying to write a program that will find rows in a dataframe which correspond to rows on an excel sheet. The rows in the dataframe won't be in order with the rows in excel, and in many cases may not even exist. The important thing is that the values in the appropriate columns match. Once these matches are identified, I want to then add only the matching rows in the dataframe to the excel sheet.
I do have a script that works, but its not pretty, and I know there has to be a better way to do this.
Here is an example of the code I wrote with the corresponding information. Any suggestions on how to make this better would be appreciated.
import pandas as pd
import openpyxl
from openpyxl import load_workbook
wb =load_workbook("test.xlsx")
ws = wb["testsheet"]
data = {'ColA':['BA', 'AG', 'BC', 'AS', 'BW', 'AA'],
'ColB':['BA', 'AH', 'BC', 'AS', 'BW', 'AB'],
'ColC':['BA', 'AI', 'BC', 'AS', 'BW', 'AC']}
df = pd.DataFrame(data)
row_count = len(ws['A'])
for i in range(2, row_count +1):
for k in range(len(df)):
if ((ws.cell(i,1).value == df.iloc[k]['ColA']) and
(ws.cell(i,2).value == df.iloc[k]['ColB']) and
(ws.cell(i,3).value == df.iloc[k]['ColC'])):
for j in range(0,3):
ws.cell(i, j + 5).value = df.iloc[k][j]
wb.save("test.xlsx")
Here's the excel sheet "testsheet":
Upvotes: 1
Views: 295
Reputation: 5992
len(df.columns)
to make this script more generalizable and shorter.Upvotes: 1