Reputation: 199
how could I compare the data that I got in dictionary format with the existing xlsx file. Dictionary keys are same as the excel header. What is the easy way to compare? Here is the example: Data in Dict:
{'Sheet1': [{'ID': 1.0, 'NAME': 'hostname1', 'IP_ADDRESS': '192.168.1.1', 'STATUS': 'completed'}, {'ID': 2.0, 'NAME': 'hostname2', 'IP_ADDRESS': '192.168.1.2', 'STATUS': 'completed'}, {'ID': 3.0, 'NAME': 'hostname3', 'IP_ADDRESS': '192.168.1.3', 'STATUS': 'in_progress'}, {'ID': 4.0, 'NAME': 'hostname4', 'IP_ADDRESS': '192.168.1.4', 'STATUS': 'completed'}]}
I would like to compare the dict value against the excel file (by the sheet), If the value is different in the Dict, update the excel file. If the entry doesn't exist in the excel simply add the new entry in the excel.
How should I start? compare and update and extend new entry in the excel file sheet. Thanks
##### Updatedef writexlsx_multi_sheets(filename, data, sheet_name, **to_excel_kwargs):
from openpyxl import load_workbook
orginalDF = pd.read_excel(filename, sheet_name=sheet_name, index=False)
newDF = pd.DataFrame.from_dict(data)
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
writer.book = load_workbook(filename)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
if not orginalDF.equals(newDF):
result = orginalDF.combine_first(newDF)
result.to_excel(writer, sheet_name=sheet_name, index=False)
writer.save()
else:
return False
Upvotes: 0
Views: 315
Reputation: 1836
First, convert your dictionary to a data frame.
import pandas as pd
data = {'ID': 1.0, 'NAME': 'hostname1', 'IP_ADDRESS': '192.168.1.1', 'STATUS': 'completed'}, {'ID': 2.0, 'NAME': 'hostname2', 'IP_ADDRESS': '192.168.1.2', 'STATUS': 'completed'}, {'ID': 3.0, 'NAME': 'hostname3', 'IP_ADDRESS': '192.168.1.3', 'STATUS': 'in_progress'}, {'ID': 4.0, 'NAME': 'hostname4', 'IP_ADDRESS': '192.168.1.4', 'STATUS': 'completed'}
df1 = pd.DataFrame.from_dict(data)
df1
# output
ID NAME IP_ADDRESS STATUS
0 1 hostname1 192.168.1.1 completed
1 2 hostname2 192.168.1.2 completed
2 3 hostname3 192.168.1.3 in_progress
3 4 hostname4 192.168.1.4 completed
Then read your xlsx file to another data frame. let's assume you have one fewer data in excel file
df2 = pd.read_csv('filename')
df2
# output
ID NAME IP_ADDRESS STATUS
0 1 hostname1 192.168.1.1 completed
1 2 hostname2 192.168.1.2 completed
2 3 hostname3 192.168.1.3 NaN
now use combine_first()
function to check
df3 = df2.combine_first(df1)
df3
# output
ID NAME IP_ADDRESS STATUS
0 1 hostname1 192.168.1.1 completed
1 2 hostname2 192.168.1.2 completed
2 3 hostname3 192.168.1.3 in_progress
3 4 hostname4 192.168.1.4 completed
After comparing everything you can save it to the excel file
df3.to_csv('filename')
Upvotes: 1