Reputation: 263
I have two datasets and I'm trying to compare only a single column based on unique ID. I want to track and flag any value changes in that column, and output those changes into another DF.
DF1:
ID Status
1234 Cleared
4321 Pending
5678 Distributed
8765 Validating
2468 Blocked
8642 Pending
1357 Pending
7531 Distributed
DF2:
ID Status
1234 Distributed
4321 Pending
5678 Pending
8765 Cleared
2468 Blocked
8642 Blocked
1357 Cleared
7531 Blocked
Output:
ID Status Status
1234 Cleared Distributed
5678 Distributed Pending
8765 Validating Cleared
8642 Pending Blocked
1357 Pending Cleared
7531 Distributed Blocked
Lastly, I am also trying to see any changes in another column based on changes in the status column. This column includes a list of countries using the standard ISO Alpha-2 country codes. Was thinking of doing a simple character count here but that doesn't make sense because if US was removed and replaced with DE, then character count would stay the same.
My code for all of this (repurposed from other questions here) is as follows, but I feel like there's probably a way more efficient way of doing this...
for index, compare_row in compare_df.iterrows():
row_df1 = df1.loc[df1['ID'] == compare_row['ID']]
row_df2 = df2.loc[df2['ID'] == compare_row['ID']]
if (row_df1.iloc[0]['Status'] != row_df2.iloc[0]['Status']):
print "here 1"
output_df.append(row_df1)
output_df.append(row_df2)
elif (row_df1.iloc[0]['Status'] in ['Cleared', 'Distributed']) & (row_df1.iloc[0]['Territory'] != row_df2.iloc[0]['Territory']):
print "here 2"
output_df.append(row_df1)
output_df.append(row_df2)
Upvotes: 1
Views: 79
Reputation: 153460
Use .query
for improved readability.
DF1.merge(DF2, on = 'ID').query('Status_x != Status_y')
Output:
ID Status_x Status_y
0 1234 Cleared Distributed
2 5678 Distributed Pending
3 8765 Validating Cleared
5 8642 Pending Blocked
6 1357 Pending Cleared
7 7531 Distributed Blocked
Upvotes: 0
Reputation: 2648
It might not be the most effective way, but at least reach the goal. :)
df3 = df1.copy()
df3['Status_df2'] = df2.Status.copy()
df3 = df3.loc[df3.Status != df3.Status_df2]
Upvotes: 0
Reputation: 1726
Use merge
:
df3 = df1.merge(df2, left_index = True, right_index = True)
mask = df3['Status_x'] == df3['Status_y']
df3 = df3[~mask]
Upvotes: 2