Tommy
Tommy

Reputation: 515

Merging Pandas DataFrame within a specific Date Range

I've 2 DataFrames. One for Invoice Data and the other for Promotion Dates.

Invoice Data DataFrame

+------------+-------------+---------------+
| LocationNo | InvoiceDate | InvoiceAmount |
+------------+-------------+---------------+
|      A     |  01-Jul-20  |       79      |
+------------+-------------+---------------+
|      B     |  01-Jul-20  |       72      |
+------------+-------------+---------------+
|      C     |  01-Jul-20  |       24      |
+------------+-------------+---------------+
|      A     |  02-Jul-20  |       68      |
+------------+-------------+---------------+
|      B     |  02-Jul-20  |       6       |
+------------+-------------+---------------+
|      C     |  02-Jul-20  |       27      |
+------------+-------------+---------------+
|      A     |  03-Jul-20  |       25      |
+------------+-------------+---------------+
|      B     |  03-Jul-20  |       62      |
+------------+-------------+---------------+
|      C     |  03-Jul-20  |       58      |
+------------+-------------+---------------+
|      D     |  03-Jul-20  |       36      |
+------------+-------------+---------------+
|      E     |  03-Jul-20  |       65      |
+------------+-------------+---------------+
|      F     |  03-Jul-20  |       81      |
+------------+-------------+---------------+

df_1 = pd.DataFrame({
    'LocationNo':['A','B','C','A','B','C','A','B','C','D','E','F'],
    'InvoiceDate':['01-Jul-20','01-Jul-20','01-Jul-20','02-Jul-20','02-Jul-20','02-Jul-20',
                  '03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20'],
    'InvoiceAmount':[79,72,24,68,6,27,25,62,58,36,65,81]
})

Promotion Dates DataFrame

+------------+----------------+--------------+
| LocationNo | PromotionStart | PromotionEnd |
+------------+----------------+--------------+
|      A     |    01-Jul-20   |   02-Jul-20  |
+------------+----------------+--------------+
|      B     |    02-Jul-20   |   03-Jul-20  |
+------------+----------------+--------------+
|      C     |    03-Jul-20   |   05-Jul-20  |
+------------+----------------+--------------+
|      D     |    01-Jul-20   |   05-Jul-20  |
+------------+----------------+--------------+
|      E     |    02-Jul-20   |   02-Jul-20  |
+------------+----------------+--------------+
|      F     |    05-Jul-20   |   06-Jul-20  |
+------------+----------------+--------------+

df_2 = pd.DataFrame({
    'LocationNo' : ['A','B','C','D','E','F'],
    'PromotionStart':['01-Jul-20','02-Jul-20','03-Jul-20','01-Jul-20','02-Jul-20','05-Jul-20'],
    'PromotionEnd':['02-Jul-20','03-Jul-20','05-Jul-20','05-Jul-20','02-Jul-20','06-Jul-20',]
})

My Task is to Merge both the DataFrames based on 2 conditions & add value Yes to a new column Promotion if these conditions are met.

conditions are as follows

  1. LocationNo should match
  2. The InvoiceDate should be between the respective LocationNo's PromotionStart and PromotionEnd.

Basically, to merge only If the InvoiceDate falls between PromotionStart and PromotionEnd dates.

My desired output is as below

+------------+-------------+---------------+-----------+
| LocationNo | InvoiceDate | InvoiceAmount | Promotion |
+------------+-------------+---------------+-----------+
|      A     |  01-Jul-20  |       79      |    Yes    |
+------------+-------------+---------------+-----------+
|      B     |  01-Jul-20  |       72      |           |
+------------+-------------+---------------+-----------+
|      C     |  01-Jul-20  |       24      |           |
+------------+-------------+---------------+-----------+
|      A     |  02-Jul-20  |       68      |    Yes    |
+------------+-------------+---------------+-----------+
|      B     |  02-Jul-20  |       6       |    Yes    |
+------------+-------------+---------------+-----------+
|      C     |  02-Jul-20  |       27      |           |
+------------+-------------+---------------+-----------+
|      A     |  03-Jul-20  |       25      |           |
+------------+-------------+---------------+-----------+
|      B     |  03-Jul-20  |       62      |    Yes    |
+------------+-------------+---------------+-----------+
|      C     |  03-Jul-20  |       58      |    Yes    |
+------------+-------------+---------------+-----------+
|      D     |  03-Jul-20  |       36      |    Yes    |
+------------+-------------+---------------+-----------+
|      E     |  03-Jul-20  |       65      |           |
+------------+-------------+---------------+-----------+
|      F     |  03-Jul-20  |       81      |           |
+------------+-------------+---------------+-----------+

I know how to merge columns based on multiple conditions. But, in this case, I need to merge within Date Range.

kindly, let me know how should i approach to solve this.

Upvotes: 2

Views: 1392

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

Let's try merge and filter:

out = df_1.merge(df_2, on='LocationNo', how='left')

df_1['Promotion'] = np.where(out['InvoiceDate'].between(out['PromotionStart'], out['PromotionEnd']),
                             'Yes', '')

Output:

   LocationNo InvoiceDate  InvoiceAmount Promotion
0           A   01-Jul-20             79       Yes
1           B   01-Jul-20             72          
2           C   01-Jul-20             24          
3           A   02-Jul-20             68       Yes
4           B   02-Jul-20              6       Yes
5           C   02-Jul-20             27          
6           A   03-Jul-20             25          
7           B   03-Jul-20             62       Yes
8           C   03-Jul-20             58       Yes
9           D   03-Jul-20             36       Yes
10          E   03-Jul-20             65          
11          F   03-Jul-20             81          

Upvotes: 4

Related Questions