Reputation: 21584
I'm aware of this question, and I've been working through it. I have two dataframes df1
and df2
which look like:
df1:
U,T,L,P
01,string1,a,0.9
02,string2,b,0.9
03,string3,c,0.9
df2:
U,T,L,P
01,string1,a,0.9
02,string2,d,0.9
03,string3,c,0.9
Basically the only difference is in the L
column and in the second row, where b
becomes d
. I would like to check this difference comparing the dataframes, keeping all the existing information. Following the above question, my attempts so far have been:
difference_locations = np.where(df1 != df2)
changed_from = df1.values[difference_locations]
changed_to = df2.values[difference_locations]
a = pd.DataFrame({'from': changed_from, 'to': changed_to})
which returns:
from to
0 b d
The result is ok, but I would like to add also the remaining information about the initial datasets. The only changes occur in the columns L
and P
, and I would highlight the changes so that the result:
U T from to
0 02 string2 b d
I tried to build the dataframe adding the existing columns with something like:
a = pd.DataFrame({'U': df1['U'],'from': changed_from, 'to': changed_to})
but of course it doesn't work since I should perform a lookup over the index to get the right values of U
and T
in correspondence of the changed row.
Upvotes: 1
Views: 1382
Reputation: 863741
You can use function merge on column L
. Then this column is renamed to to
.
a = pd.DataFrame({'from': changed_from, 'L': changed_to})
print a
# L from
#0 d b
out = pd.merge(a, df2, on=['L'])
out.rename(columns={'L':'to'}, inplace=True)
out = out[['U','T','from', 'to']]
print out
# U T from to
#0 2 string2 b d
Upvotes: 2