Reputation: 1739
I have these two pandas dataframes I created and cleaned from online data, and I was trying to merge them based on their dates, which are all by month. However, the first dataset has its days on the last day of the month, whether the second dataset is based on the first day of the month.
# data1
0 1987-01-01 63.752
1 1987-02-01 64.152
2 1987-03-01 64.488
3 1987-04-01 64.995
# data2
0 1987-01-31 1115.10
1 1987-02-30 1095.63
2 1987-03-30 1036.19
3 1987-04-30 1057.08
I would normally merge them by something like this if I had daily data with a few missing days
data3 = pd.merge(left=data1, left_on='Date', right=data2, right_on='Date')
but in this case they are never matching, even though they are all similar dates.
How would I go about "telling" Pandas to combine the datasets based on dates that are just a few days apart, and name each data by just "month - year"? I don't know where to begin.
Upvotes: 2
Views: 1960
Reputation: 9274
IIUC, you want to merge on the closest dates? This is what merge_asof
is for
If dates are not already converted to datetime, convert them like so
data1.date = pd.to_datetime(data1.date)
data2.date = pd.to_datetime(data2.date)
now complete the merge
pd.merge_asof(data1,data2,on='date',direction='nearest')
date value_x value_y
0 1987-01-01 63.752 1115.10
1 1987-02-01 64.152 1036.19
2 1987-03-01 64.488 1057.08
3 1987-04-01 64.995 1057.08
Upvotes: 4
Reputation: 51335
If your date columns are called date
, you can do something along these lines:
data1['date'] = pd.to_datetime(data1['date'])
data2['date'] = pd.to_datetime(data2['date'])
data1['month'] = data1['date'].dt.month
data1['year'] = data1['date'].dt.year
data2['month'] = data2['date'].dt.month
data2['year'] = data2['date'].dt.year
Which would result in something like this:
>>> data1
date value month year
0
0 1987-01-01 63.752 1 1987
1 1987-02-01 64.152 2 1987
2 1987-03-01 64.488 3 1987
3 1987-04-01 64.995 4 1987
>>> data2
date value month year
0
0 1987-01-01 1115.10 1 1987
1 1987-01-02 1095.63 1 1987
2 1987-02-01 1036.19 2 1987
3 1987-02-28 1057.08 2 1987
You can then merge those on month and year:
data3=data1.merge(data2, on=['month', 'year'])
Upvotes: 1