How to compare a dataframe with another df and return new rows from the first df using pandas

Question

I am trying to ascertain if values in a test dataframe (df2) are not appearing in another DF (df1). The following are the two DFs:

df1 created from the following source:

field1	field2
AG	Agree
SA	Somewhat Agree
DG	Disagree
SD	Somewhat Disagree
NO	None

df2 created from the following source:

field1	field2
CA	California
TX	Texas
NO	None
NY	New York

Using Method 1 (see below), I am getting the expected result, which is:

Method 1

diff_df = df2[~(df2[field1].isin(df1[field1]) & df2[field2].isin(df1[field2]))].reset_index(drop=True)

This gives me the folllowing expected result:

  field1               field2
0     CA           California
1     TX                Texas
2     NY             New York

Note: The duplicate value in df2 (NO: None) gets dropped, too.

However, there is one problem that I am facing: There can be situations when there are different set of fields that will need to be compared (eg. there may be a third field field3 in the equation).

From case to case basis, the number of fields would vary greatly over which the user won't have control.

My problem: How do I modify my query so that by comparing the two dataframes I get the expected result?

In the situation as explained, what shuld be the possible approach?

How to compare a dataframe with another df and return new rows from the first df using pandas

Answers (1)

Related Questions