Robert
Robert

Reputation: 183

Comparing dataframes to Rows in Excel

[Someone suggested that this is a duplicate of Pandas Merging 101. I disagree. That page talks only about merging two existing pandas. It does not explain how to do a comparison between excel and an existing panda. Even if i did a join with two dataframes, it appears that that would require even more work. I was trying to find a way to reduce how much code I need to write.]

i'm extremely new to Pandas and Dataframes, so i'm hoping I could get some suggestions on how to improve a script I wrote.

I'm trying to write a program that will find rows in a dataframe which correspond to rows on an excel sheet. The rows in the dataframe won't be in order with the rows in excel, and in many cases may not even exist. The important thing is that the values in the appropriate columns match. Once these matches are identified, I want to then add only the matching rows in the dataframe to the excel sheet.

I do have a script that works, but its not pretty, and I know there has to be a better way to do this.

Here is an example of the code I wrote with the corresponding information. Any suggestions on how to make this better would be appreciated.

import pandas as pd
import openpyxl
from openpyxl import load_workbook


wb =load_workbook("test.xlsx")
ws = wb["testsheet"]


data = {'ColA':['BA', 'AG', 'BC', 'AS', 'BW', 'AA'], 
        'ColB':['BA', 'AH', 'BC', 'AS', 'BW', 'AB'],
        'ColC':['BA', 'AI', 'BC', 'AS', 'BW', 'AC']}

df = pd.DataFrame(data)


row_count = len(ws['A'])
for i in range(2, row_count +1):
    for k in range(len(df)):
        if ((ws.cell(i,1).value == df.iloc[k]['ColA']) and 
            (ws.cell(i,2).value == df.iloc[k]['ColB']) and  
            (ws.cell(i,3).value == df.iloc[k]['ColC'])):
            for j in range(0,3):
                ws.cell(i, j + 5).value = df.iloc[k][j]

wb.save("test.xlsx")

Here's the excel sheet "testsheet":

enter image description here

Upvotes: 1

Views: 295

Answers (1)

Kermit
Kermit

Reputation: 5992

  • Similar to how you checked the length of rows to see how many iterations you need to do... you could also check the length of the columns len(df.columns) to make this script more generalizable and shorter.
  • Also, you could access 'ColA' by column number rather than by an explicit string.

Upvotes: 1

Related Questions