Reputation: 447
I have two dataframes with datetime index.
import pandas as pd
d = {'dat': ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'],'x': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d)
df1.set_index(['dat'], inplace=True)
df1.index = pd.to_datetime(df1.index)
d = {'dat': ['2016-01-01', '2017-01-01'],'y': [10, 11]}
df2 = pd.DataFrame(d)
df2.set_index(['dat'], inplace=True)
df2.index = pd.to_datetime(df2.index)
df1:
x
dat
2016-01-01 1
2016-01-02 2
2016-01-03 3
2017-01-01 4
2017-01-02 5
2017-01-03 6
df2:
y
dat
2016-01-01 10
2017-01-01 11
I would like to join them using only year and month parts of the index. So the output would look like following:
df3:
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
I have tried to join them using
df1.join(df2, how='inner')
and I know that I can extract year and month parts like so:
df1.index.map(lambda x: x.strftime('%Y-%m'))
df2.index.map(lambda x: x.strftime('%Y-%m'))
But I wonder how I can combine all these to achieve desired result?
Many thanks
Upvotes: 5
Views: 957
Reputation: 294218
The information you want to merge on isn't explicitly defined anywhere. And there isn't a nice to way to keep your dates in the index when we merge without destroying it. So, we move the indices to the dataframe proper and create two new columns to merge on. Namely, year
and month
. I wrapped this part in a function to better see what's happening where.
def f(df):
df = df.reset_index()
return df.assign(year=df.dat.dt.year, month=df.dat.dt.month)
df = f(df1).merge(f(df2), on=['year', 'month'], suffixes=['', '_'])
df.set_index('dat')[['x', 'y']]
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
This is a different concept using pd.Index.map
and to_period
. Create a dictionary mapping from df2
that translates the year/month period object to the corresponding value in column y
. Then use map
to map the period-ized dates in df1.index
to the correct y
values.
m = dict(zip(df2.index.to_period('M'), df2.y))
df1.assign(y=df1.index.to_period('M').map(m.get))
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
Setup
dates1 = ['2016-01-01', '2016-01-02', '2016-01-03',
'2017-01-01', '2017-01-02', '2017-01-03']
df1 = pd.DataFrame({'x': range(1, 7)}, pd.DatetimeIndex(dates1, name='dat'))
dates2 = ['2016-01-01', '2017-01-01']
df2 = pd.DataFrame({'y': [10, 11]}, pd.DatetimeIndex(dates2, name='dat'))
Upvotes: 5
Reputation: 153460
You could use merge
with assign
on year
and month
from DateTimeIndex:
df3 = (df1.assign(year=df1.index.year, month=df1.index.month)
.merge(df2.assign(year=df2.index.year, month=df2.index.month), on =['year','month'],right_index=True)
.drop(['year','month'],axis=1))
Output:
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
Upvotes: 3