Reputation: 51
I have a scenario where I want to find non-matching rows between two dataframes. Both dataframes will have around 30 columns and an id
column that uniquely identify each record/row. So, I want to check if a row in df1
is different from the one in df2
. The df1
is an updated dataframe and df2
is the previous version.
I have tried an approach pd.concat([df1, df2]).drop_duplicates(keep=False)
, but it just combines both dataframes. Is there a way to do it. I would really appreciate the help.
The sample data looks like this for both dfs
.
id
user_id
type
status
There will be total 39 columns which may have NULL
values in them.
Thanks.
P.S. df2
will always be a subset of df1
.
Upvotes: 2
Views: 2251
Reputation: 72
If your df1 and df2 has the same shape, you may easily compare with this code.
df3 = pd.DataFrame(np.where(df1==df2,True,False), columns=df1.columns)
And you will see boolean output "False" for not matching cell value.
Upvotes: 1