HMLDude
HMLDude

Reputation: 1637

Boolean Logic Comparing Data From Two Pandas Dataframes

I have the following dataframe df1:

          Date  Invoice          Name  Price  Coupon Location
0   2017-12-24   700349      John Doe  59.95    NONE    VAGG1
1   2017-12-24   700347     Joe Smith  59.95    GBMR       GG
2   2017-12-24   700345  Dave Johnson  35.00  CHANGE    VAGG1
3   2017-12-24   700342     Sue Davis  35.00  GADSLR    VAGG1
4   2017-12-23   700329   Betty Clark  84.95  GADSLR      GG2

and a second dataframe df2:

           Date  Invoice         Name  Price    Coupon    Location
0   2017-12-24    800349     John Doe  59.95   NONE      VAGG1
1   2017-12-24    800347    Joe Smith  59.95   GBMR      GG
2   2017-12-24    800345     John Doe  17.95   CHANGE    VAGG1
3   2017-12-24    800342     John Doe   9.95   GADSLR    VAGG1
4   2017-12-23    800329  Sue Simpson  34.95   GADSLR    GG2

I would like to create a third Dataframe, df3, using the following logic.

So the output dataframe, df3, should appear as follows:

+------------+---------+----------+-------+--------+----------+
|    Date    | Invoice |   Name   | Price | Coupon | Location |
+------------+---------+----------+-------+--------+----------+
| 2017-12-24 |  800345 | John Doe | 17.95 | CHANGE | VAGG1    |
| 2017-12-24 |  800342 | John Doe |  9.95 | GADSLR | VAGG1    |
+------------+---------+----------+-------+--------+----------+

Upvotes: 2

Views: 556

Answers (2)

HMLDude
HMLDude

Reputation: 1637

The following code block:

df3 = pd.merge(df1, df2, on='Name', how='right')\
        .query('Price_x != Price_y')\
        .drop('Price_x', 1)\
        .rename(columns={'Price_y' : 'Price'})

Results in df3 =

            Date_x  Invoice_x         Name Coupon_x Location_x        Date_y  \
1   2017-12-24   700349.0     John Doe     NONE      VAGG1   2017-12-24
2   2017-12-24   700349.0     John Doe     NONE      VAGG1   2017-12-24
4          NaN        NaN  Sue Simpson      NaN        NaN   2017-12-23

   Invoice_y  Price  Coupon_y  Location_y
1     800345  17.95   CHANGE    VAGG1
2     800342   9.95   GADSLR    VAGG1
4     800329  34.95   GADSLR    GG2

The extended code block:

df3 = pd.merge(df1, df2, on='Name', how='right')\
        .query('Price_x != Price_y')\
        .drop('Price_x', 1)\
        .rename(columns={'Price_y' : 'Price'})\
        .drop('Location_x',1)\
        .drop('Coupon_x',1)\
        .drop('Date_x',1)\
        .drop('Invoice_x',1)\
        .rename(columns={'Date_y' : 'Date'})\
        .rename(columns={'Invoice_y' : 'Invoice'})\
        .rename(columns={'Coupon_y' : 'Coupon'})\
        .rename(columns={'Location_y' : 'Location'})

Results in df3 =

          Name          Date  Invoice  Price    Coupon    Location
1     John Doe   2017-12-24    800345  17.95   CHANGE    VAGG1
2     John Doe   2017-12-24    800342   9.95   GADSLR    VAGG1
4  Sue Simpson   2017-12-23    800329  34.95   GADSLR    GG2

Which is problematic because it results in column rows which are out of order. With the addition of:

df3=df3[['Date', 'Invoice', 'Name', 'Price', 'Coupon', 'Location']]

We get df3 =

           Date  Invoice         Name  Price    Coupon    Location
1   2017-12-24    800345     John Doe  17.95   CHANGE    VAGG1
2   2017-12-24    800342     John Doe   9.95   GADSLR    VAGG1
4   2017-12-23    800329  Sue Simpson  34.95   GADSLR    GG2

Which is close to the desired answer, except for the "Sue Simpson" entry, which should be absent.

Upvotes: 1

cs95
cs95

Reputation: 402323

Using merge + query -

df1.merge(df2[['Name', 'Price']], on='Name')\
   .query('Price_x != Price_y')\
   .drop('Price_x', 1)\
   .rename(columns={'Price_y' : 'Price'})

         Date  Invoice      Name Coupon Location  Price
1  2017-12-24   700349  John Doe   NONE    VAGG1  17.95
2  2017-12-24   700349  John Doe   NONE    VAGG1   9.95

Where df1 and df2 are your respective dataframes.

Upvotes: 1

Related Questions