miu
miu

Reputation: 199

how to compare dict data within existing excel file

how could I compare the data that I got in dictionary format with the existing xlsx file. Dictionary keys are same as the excel header. What is the easy way to compare? Here is the example: Data in Dict:

{'Sheet1': [{'ID': 1.0, 'NAME': 'hostname1', 'IP_ADDRESS': '192.168.1.1', 'STATUS': 'completed'}, {'ID': 2.0, 'NAME': 'hostname2', 'IP_ADDRESS': '192.168.1.2', 'STATUS': 'completed'}, {'ID': 3.0, 'NAME': 'hostname3', 'IP_ADDRESS': '192.168.1.3', 'STATUS': 'in_progress'}, {'ID': 4.0, 'NAME': 'hostname4', 'IP_ADDRESS': '192.168.1.4', 'STATUS': 'completed'}]}

Excel file: enter image description here

I would like to compare the dict value against the excel file (by the sheet), If the value is different in the Dict, update the excel file. If the entry doesn't exist in the excel simply add the new entry in the excel.

How should I start? compare and update and extend new entry in the excel file sheet. Thanks

##### Update
def writexlsx_multi_sheets(filename, data, sheet_name, **to_excel_kwargs):

from openpyxl import load_workbook

orginalDF = pd.read_excel(filename, sheet_name=sheet_name, index=False)
newDF = pd.DataFrame.from_dict(data)
if 'engine' in to_excel_kwargs:
    to_excel_kwargs.pop('engine')

writer = pd.ExcelWriter(filename, engine='openpyxl')
writer.book = load_workbook(filename)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

if not orginalDF.equals(newDF):
    result = orginalDF.combine_first(newDF)
    result.to_excel(writer, sheet_name=sheet_name, index=False)
    writer.save()
else:
    return False

Upvotes: 0

Views: 315

Answers (1)

Darkknight
Darkknight

Reputation: 1836

First, convert your dictionary to a data frame.

import pandas as pd
data = {'ID': 1.0, 'NAME': 'hostname1', 'IP_ADDRESS': '192.168.1.1', 'STATUS': 'completed'}, {'ID': 2.0, 'NAME': 'hostname2', 'IP_ADDRESS': '192.168.1.2', 'STATUS': 'completed'}, {'ID': 3.0, 'NAME': 'hostname3', 'IP_ADDRESS': '192.168.1.3', 'STATUS': 'in_progress'}, {'ID': 4.0, 'NAME': 'hostname4', 'IP_ADDRESS': '192.168.1.4', 'STATUS': 'completed'}
df1 = pd.DataFrame.from_dict(data)
df1
# output

   ID       NAME   IP_ADDRESS       STATUS
0   1  hostname1  192.168.1.1    completed
1   2  hostname2  192.168.1.2    completed
2   3  hostname3  192.168.1.3  in_progress
3   4  hostname4  192.168.1.4    completed

Then read your xlsx file to another data frame. let's assume you have one fewer data in excel file

df2 = pd.read_csv('filename')
df2

# output
   ID       NAME   IP_ADDRESS       STATUS
0   1  hostname1  192.168.1.1    completed
1   2  hostname2  192.168.1.2    completed
2   3  hostname3  192.168.1.3    NaN

now use combine_first() function to check

df3 = df2.combine_first(df1)
df3

# output
  ID       NAME   IP_ADDRESS       STATUS
0  1  hostname1  192.168.1.1    completed
1  2  hostname2  192.168.1.2    completed
2  3  hostname3  192.168.1.3  in_progress
3  4  hostname4  192.168.1.4    completed

After comparing everything you can save it to the excel file

df3.to_csv('filename')

Upvotes: 1

Related Questions