Reputation: 515
I've 2 DataFrames. One for Invoice Data
and the other for Promotion Dates
.
Invoice Data
DataFrame
+------------+-------------+---------------+
| LocationNo | InvoiceDate | InvoiceAmount |
+------------+-------------+---------------+
| A | 01-Jul-20 | 79 |
+------------+-------------+---------------+
| B | 01-Jul-20 | 72 |
+------------+-------------+---------------+
| C | 01-Jul-20 | 24 |
+------------+-------------+---------------+
| A | 02-Jul-20 | 68 |
+------------+-------------+---------------+
| B | 02-Jul-20 | 6 |
+------------+-------------+---------------+
| C | 02-Jul-20 | 27 |
+------------+-------------+---------------+
| A | 03-Jul-20 | 25 |
+------------+-------------+---------------+
| B | 03-Jul-20 | 62 |
+------------+-------------+---------------+
| C | 03-Jul-20 | 58 |
+------------+-------------+---------------+
| D | 03-Jul-20 | 36 |
+------------+-------------+---------------+
| E | 03-Jul-20 | 65 |
+------------+-------------+---------------+
| F | 03-Jul-20 | 81 |
+------------+-------------+---------------+
df_1 = pd.DataFrame({
'LocationNo':['A','B','C','A','B','C','A','B','C','D','E','F'],
'InvoiceDate':['01-Jul-20','01-Jul-20','01-Jul-20','02-Jul-20','02-Jul-20','02-Jul-20',
'03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20'],
'InvoiceAmount':[79,72,24,68,6,27,25,62,58,36,65,81]
})
Promotion Dates
DataFrame
+------------+----------------+--------------+
| LocationNo | PromotionStart | PromotionEnd |
+------------+----------------+--------------+
| A | 01-Jul-20 | 02-Jul-20 |
+------------+----------------+--------------+
| B | 02-Jul-20 | 03-Jul-20 |
+------------+----------------+--------------+
| C | 03-Jul-20 | 05-Jul-20 |
+------------+----------------+--------------+
| D | 01-Jul-20 | 05-Jul-20 |
+------------+----------------+--------------+
| E | 02-Jul-20 | 02-Jul-20 |
+------------+----------------+--------------+
| F | 05-Jul-20 | 06-Jul-20 |
+------------+----------------+--------------+
df_2 = pd.DataFrame({
'LocationNo' : ['A','B','C','D','E','F'],
'PromotionStart':['01-Jul-20','02-Jul-20','03-Jul-20','01-Jul-20','02-Jul-20','05-Jul-20'],
'PromotionEnd':['02-Jul-20','03-Jul-20','05-Jul-20','05-Jul-20','02-Jul-20','06-Jul-20',]
})
My Task is to Merge both the DataFrames based on 2 conditions & add value Yes
to a new column Promotion
if these conditions are met.
conditions are as follows
LocationNo
should matchInvoiceDate
should be between the respective LocationNo
's PromotionStart
and PromotionEnd
.Basically, to merge only If the InvoiceDate
falls between PromotionStart
and PromotionEnd
dates.
My desired output is as below
+------------+-------------+---------------+-----------+
| LocationNo | InvoiceDate | InvoiceAmount | Promotion |
+------------+-------------+---------------+-----------+
| A | 01-Jul-20 | 79 | Yes |
+------------+-------------+---------------+-----------+
| B | 01-Jul-20 | 72 | |
+------------+-------------+---------------+-----------+
| C | 01-Jul-20 | 24 | |
+------------+-------------+---------------+-----------+
| A | 02-Jul-20 | 68 | Yes |
+------------+-------------+---------------+-----------+
| B | 02-Jul-20 | 6 | Yes |
+------------+-------------+---------------+-----------+
| C | 02-Jul-20 | 27 | |
+------------+-------------+---------------+-----------+
| A | 03-Jul-20 | 25 | |
+------------+-------------+---------------+-----------+
| B | 03-Jul-20 | 62 | Yes |
+------------+-------------+---------------+-----------+
| C | 03-Jul-20 | 58 | Yes |
+------------+-------------+---------------+-----------+
| D | 03-Jul-20 | 36 | Yes |
+------------+-------------+---------------+-----------+
| E | 03-Jul-20 | 65 | |
+------------+-------------+---------------+-----------+
| F | 03-Jul-20 | 81 | |
+------------+-------------+---------------+-----------+
I know how to merge columns based on multiple conditions. But, in this case, I need to merge within Date Range
.
kindly, let me know how should i approach to solve this.
Upvotes: 2
Views: 1392
Reputation: 150735
Let's try merge
and filter:
out = df_1.merge(df_2, on='LocationNo', how='left')
df_1['Promotion'] = np.where(out['InvoiceDate'].between(out['PromotionStart'], out['PromotionEnd']),
'Yes', '')
Output:
LocationNo InvoiceDate InvoiceAmount Promotion
0 A 01-Jul-20 79 Yes
1 B 01-Jul-20 72
2 C 01-Jul-20 24
3 A 02-Jul-20 68 Yes
4 B 02-Jul-20 6 Yes
5 C 02-Jul-20 27
6 A 03-Jul-20 25
7 B 03-Jul-20 62 Yes
8 C 03-Jul-20 58 Yes
9 D 03-Jul-20 36 Yes
10 E 03-Jul-20 65
11 F 03-Jul-20 81
Upvotes: 4