pandas - check changes in dataframe

Question

I'm aware of this question, and I've been working through it. I have two dataframes df1 and df2 which look like:

df1:

U,T,L,P
01,string1,a,0.9
02,string2,b,0.9
03,string3,c,0.9

df2:

U,T,L,P
01,string1,a,0.9
02,string2,d,0.9
03,string3,c,0.9

Basically the only difference is in the L column and in the second row, where b becomes d. I would like to check this difference comparing the dataframes, keeping all the existing information. Following the above question, my attempts so far have been:

difference_locations = np.where(df1 != df2)

changed_from = df1.values[difference_locations]

changed_to = df2.values[difference_locations]

a = pd.DataFrame({'from': changed_from, 'to': changed_to})

which returns:

  from to
0    b  d

The result is ok, but I would like to add also the remaining information about the initial datasets. The only changes occur in the columns L and P, and I would highlight the changes so that the result:

  U  T       from to
0 02 string2    b  d

I tried to build the dataframe adding the existing columns with something like:

a = pd.DataFrame({'U': df1['U'],'from': changed_from, 'to': changed_to})

but of course it doesn't work since I should perform a lookup over the index to get the right values of U and T in correspondence of the changed row.

jezrael · Accepted Answer

You can use function merge on column L. Then this column is renamed to to.

a = pd.DataFrame({'from': changed_from, 'L': changed_to})
print a
#   L from
#0  d    b

out = pd.merge(a, df2, on=['L'])
out.rename(columns={'L':'to'}, inplace=True)
out = out[['U','T','from', 'to']]
print out
#   U        T from to
#0  2  string2    b  d

pandas - check changes in dataframe

Answers (1)

Related Questions