Reputation: 1637
I have the following dataframe df1
:
Date Invoice Name Price Coupon Location
0 2017-12-24 700349 John Doe 59.95 NONE VAGG1
1 2017-12-24 700347 Joe Smith 59.95 GBMR GG
2 2017-12-24 700345 Dave Johnson 35.00 CHANGE VAGG1
3 2017-12-24 700342 Sue Davis 35.00 GADSLR VAGG1
4 2017-12-23 700329 Betty Clark 84.95 GADSLR GG2
and a second dataframe df2
:
Date Invoice Name Price Coupon Location
0 2017-12-24 800349 John Doe 59.95 NONE VAGG1
1 2017-12-24 800347 Joe Smith 59.95 GBMR GG
2 2017-12-24 800345 John Doe 17.95 CHANGE VAGG1
3 2017-12-24 800342 John Doe 9.95 GADSLR VAGG1
4 2017-12-23 800329 Sue Simpson 34.95 GADSLR GG2
I would like to create a third Dataframe, df3
, using the following logic.
df1
, check to see if there is a match.df2
to df3
, provided
that the price for that row does not match the price associated with
that name if df1
.So the output dataframe, df3
, should appear as follows:
+------------+---------+----------+-------+--------+----------+
| Date | Invoice | Name | Price | Coupon | Location |
+------------+---------+----------+-------+--------+----------+
| 2017-12-24 | 800345 | John Doe | 17.95 | CHANGE | VAGG1 |
| 2017-12-24 | 800342 | John Doe | 9.95 | GADSLR | VAGG1 |
+------------+---------+----------+-------+--------+----------+
Upvotes: 2
Views: 556
Reputation: 1637
The following code block:
df3 = pd.merge(df1, df2, on='Name', how='right')\
.query('Price_x != Price_y')\
.drop('Price_x', 1)\
.rename(columns={'Price_y' : 'Price'})
Results in df3 =
Date_x Invoice_x Name Coupon_x Location_x Date_y \
1 2017-12-24 700349.0 John Doe NONE VAGG1 2017-12-24
2 2017-12-24 700349.0 John Doe NONE VAGG1 2017-12-24
4 NaN NaN Sue Simpson NaN NaN 2017-12-23
Invoice_y Price Coupon_y Location_y
1 800345 17.95 CHANGE VAGG1
2 800342 9.95 GADSLR VAGG1
4 800329 34.95 GADSLR GG2
The extended code block:
df3 = pd.merge(df1, df2, on='Name', how='right')\
.query('Price_x != Price_y')\
.drop('Price_x', 1)\
.rename(columns={'Price_y' : 'Price'})\
.drop('Location_x',1)\
.drop('Coupon_x',1)\
.drop('Date_x',1)\
.drop('Invoice_x',1)\
.rename(columns={'Date_y' : 'Date'})\
.rename(columns={'Invoice_y' : 'Invoice'})\
.rename(columns={'Coupon_y' : 'Coupon'})\
.rename(columns={'Location_y' : 'Location'})
Results in df3 =
Name Date Invoice Price Coupon Location
1 John Doe 2017-12-24 800345 17.95 CHANGE VAGG1
2 John Doe 2017-12-24 800342 9.95 GADSLR VAGG1
4 Sue Simpson 2017-12-23 800329 34.95 GADSLR GG2
Which is problematic because it results in column rows which are out of order. With the addition of:
df3=df3[['Date', 'Invoice', 'Name', 'Price', 'Coupon', 'Location']]
We get df3 =
Date Invoice Name Price Coupon Location
1 2017-12-24 800345 John Doe 17.95 CHANGE VAGG1
2 2017-12-24 800342 John Doe 9.95 GADSLR VAGG1
4 2017-12-23 800329 Sue Simpson 34.95 GADSLR GG2
Which is close to the desired answer, except for the "Sue Simpson" entry, which should be absent.
Upvotes: 1
Reputation: 402323
Using merge
+ query
-
df1.merge(df2[['Name', 'Price']], on='Name')\
.query('Price_x != Price_y')\
.drop('Price_x', 1)\
.rename(columns={'Price_y' : 'Price'})
Date Invoice Name Coupon Location Price
1 2017-12-24 700349 John Doe NONE VAGG1 17.95
2 2017-12-24 700349 John Doe NONE VAGG1 9.95
Where df1
and df2
are your respective dataframes.
Upvotes: 1