Reputation: 65
I currently have two pandas dataframes:
sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
{'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}]
sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
{'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215},
{'account': 'Blue Inc', 'Jan': 50, 'Feb': 90, 'Mar': 95 }]
test_1 = pd.DataFrame(sales)
test_2 = pd.DataFrame(sales2)
What I want to achieve is to show only the differences that are in 'test_2' and not in 'test_1'.
The code I currently have concatenates the two dataframes and shows me the total difference across both dataframes however all I want to see if the differences in 'test_2' to 'test_1' and not the reverse:
def compare_dataframes(df1, df2):
print 'Comparing dataframes...'
df = pd.concat([df1, df2])
df = df.reset_index(drop=True)
df_gpby = df.groupby(list(df.columns))
idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
compared_data = df.reindex(idx)
if len(compared_data) > 1:
print 'No new sales on site!'
else:
print 'New sales on site!'
print(compared_data)
How could I adapt my current function to work like this?
Upvotes: 2
Views: 527
Reputation: 16
import pandas as pd
import numpy as np
sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
{'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}]
sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
{'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215},
{'account': 'Blue Inc', 'Jan': 50, 'Feb': 90, 'Mar': 95 }]
test_1 = pd.DataFrame(sales)
test_2 = pd.DataFrame(sales2)
test_3 = test_1.append(test_2).drop_duplicates(keep=False)
print (test_3)
it prints the different rows
Upvotes: 0
Reputation: 862601
Use merge
with outer join and indicator
parameter:
df = test_1.merge(test_2, how='outer', indicator=True)
print (df)
Feb Jan Mar account _merge
0 200 150 140 Jones LLC both
1 210 200 215 Alpha Co both
2 90 50 95 Blue Inc right_only
And then filter only right_only
rows by boolean indexing
:
only2 = df[df['_merge'] == 'right_only']
print (only2)
Feb Jan Mar account _merge
2 90 50 95 Blue Inc right_only
Thanks @Jon Clements for one line solution with callback:
only2 = test_1.merge(test_2, how='outer', indicator=True)[lambda r: r._merge == 'right_only']
print (only2)
Feb Jan Mar account _merge
2 90 50 95 Blue Inc right_only
Or use query
:
only2 = test_1.merge(test_2, how='outer', indicator=True).query("_merge == 'right_only'")
Upvotes: 5