Coolio2654
Coolio2654

Reputation: 1739

Merge 2 Pandas dataframes by dates within a few days of each other?

I have these two pandas dataframes I created and cleaned from online data, and I was trying to merge them based on their dates, which are all by month. However, the first dataset has its days on the last day of the month, whether the second dataset is based on the first day of the month.

# data1
0    1987-01-01  63.752
1    1987-02-01  64.152
2    1987-03-01  64.488
3    1987-04-01  64.995

# data2
0   1987-01-31  1115.10
1   1987-02-30  1095.63
2   1987-03-30  1036.19
3   1987-04-30  1057.08

I would normally merge them by something like this if I had daily data with a few missing days

data3 = pd.merge(left=data1, left_on='Date', right=data2, right_on='Date')

but in this case they are never matching, even though they are all similar dates.

How would I go about "telling" Pandas to combine the datasets based on dates that are just a few days apart, and name each data by just "month - year"? I don't know where to begin.

Upvotes: 2

Views: 1960

Answers (2)

DJK
DJK

Reputation: 9274

IIUC, you want to merge on the closest dates? This is what merge_asof is for

If dates are not already converted to datetime, convert them like so

data1.date = pd.to_datetime(data1.date)
data2.date = pd.to_datetime(data2.date)

now complete the merge

pd.merge_asof(data1,data2,on='date',direction='nearest')

       date  value_x  value_y
0 1987-01-01   63.752  1115.10
1 1987-02-01   64.152  1036.19
2 1987-03-01   64.488  1057.08
3 1987-04-01   64.995  1057.08

Upvotes: 4

sacuL
sacuL

Reputation: 51335

If your date columns are called date, you can do something along these lines:

data1['date'] = pd.to_datetime(data1['date'])
data2['date'] = pd.to_datetime(data2['date'])

data1['month'] = data1['date'].dt.month
data1['year'] = data1['date'].dt.year

data2['month'] = data2['date'].dt.month
data2['year'] = data2['date'].dt.year

Which would result in something like this:

>>> data1
        date   value  month  year
0                                
0 1987-01-01  63.752      1  1987
1 1987-02-01  64.152      2  1987
2 1987-03-01  64.488      3  1987
3 1987-04-01  64.995      4  1987
>>> data2
        date    value  month  year
0                                 
0 1987-01-01  1115.10      1  1987
1 1987-01-02  1095.63      1  1987
2 1987-02-01  1036.19      2  1987
3 1987-02-28  1057.08      2  1987

You can then merge those on month and year:

data3=data1.merge(data2, on=['month', 'year'])

Upvotes: 1

Related Questions