NicoCasta
NicoCasta

Reputation: 75

Python Pandas: How to combine or merge two difrent size dataframes based on dates

I like to merge or combine two dataframes of different size df1 and df2, based on a range of dates, for example:

df1:

Date        Open  High  Low
2021-07-01  8.43  8.44  8.22 
2021-07-02  8.36  8.4   8.28
2021-07-06  8.22  8.23  8.06
2021-07-07  8.1   8.19  7.98
2021-07-08  8.07  8.1   7.91
2021-07-09  7.97  8.11  7.92
2021-07-12  8     8.2   8
2021-07-13  8.15  8.18  8.06
2021-07-14  8.18  8.27  8.12
2021-07-15  8.21  8.26  8.06
2021-07-16  8.12  8.23  8.07

df2:

Day of month   Revenue   Earnings
01             45000     4000
07             43500     5000
12             44350     6000
15             39050     7000

results should be something like this:

combination:

Date        Open  High  Low   Earnings
2021-07-01  8.43  8.44  8.22  4000
2021-07-02  8.36  8.4   8.28  4000
2021-07-06  8.22  8.23  8.06  4000
2021-07-07  8.1   8.19  7.98  5000
2021-07-08  8.07  8.1   7.91  5000
2021-07-09  7.97  8.11  7.92  5000
2021-07-12  8     8.2   8     6000
2021-07-13  8.15  8.18  8.06  6000
2021-07-14  8.18  8.27  8.12  6000
2021-07-15  8.21  8.26  8.06  7000
2021-07-16  8.12  8.23  8.07  7000

The Earnings column is merged based on a range of date, how can I do this in python pandas?

Upvotes: 5

Views: 746

Answers (2)

Dustin
Dustin

Reputation: 493

A more general approach is the following:

  1. First you introduce a key both dataframes share. In this case, the day of the month (or, potentially, multiple keys like day of the month and month). df1["day"] = df1["Date"].dt.day
  2. If you were to merge (leftjoin df2 on df1) now, you wouldn't have enough keys in df2, as there are days missing. To fill the gaps, we could interpolate, or use the naïve approach: If we don't know the Revenue / Earnings for a specific day, we take the last known one and apply no further calculation. One way to achieve this is described here: How to replace NaNs by preceding or next values in pandas DataFrame? df.fillna(method='ffill')
  3. Now we merge on our key. Following the doc https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html , we do it like this: df1.merge(df2, left_on='day')

Voilà!

Upvotes: 1

BENY
BENY

Reputation: 323226

Try merge_asof

#df1.date=pd.to_datetime(df1.date)
df1['Day of month'] = df1.Date.dt.day 
out = pd.merge_asof(df1, df2, on ='Day of month', direction = 'backward')
out
Out[213]: 
         Date  Open  High   Low  Day of month  Revenue  Earnings
0  2021-07-01  8.43  8.44  8.22           1    45000      4000
1  2021-07-02  8.36  8.40  8.28           2    45000      4000
2  2021-07-06  8.22  8.23  8.06           6    45000      4000
3  2021-07-07  8.10  8.19  7.98           7    43500      5000
4  2021-07-08  8.07  8.10  7.91           8    43500      5000
5  2021-07-09  7.97  8.11  7.92           9    43500      5000
6  2021-07-12  8.00  8.20  8.00          12    44350      6000
7  2021-07-13  8.15  8.18  8.06          13    44350      6000
8  2021-07-14  8.18  8.27  8.12          14    44350      6000
9  2021-07-15  8.21  8.26  8.06          15    39050      7000
10 2021-07-16  8.12  8.23  8.07          16    39050      7000

Upvotes: 2

Related Questions