olyashevska
olyashevska

Reputation: 447

join dataframes using parts of datetime index

I have two dataframes with datetime index.

import pandas as pd

d = {'dat': ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'],'x': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d)
df1.set_index(['dat'], inplace=True)
df1.index = pd.to_datetime(df1.index)

d = {'dat': ['2016-01-01', '2017-01-01'],'y': [10, 11]}
df2 = pd.DataFrame(d)
df2.set_index(['dat'], inplace=True)
df2.index = pd.to_datetime(df2.index)

df1:

            x
dat          
2016-01-01  1
2016-01-02  2
2016-01-03  3
2017-01-01  4
2017-01-02  5
2017-01-03  6

df2:

             y
dat           
2016-01-01  10
2017-01-01  11

I would like to join them using only year and month parts of the index. So the output would look like following:

df3:

            x  y
dat          
2016-01-01  1  10 
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

I have tried to join them using

df1.join(df2, how='inner')

and I know that I can extract year and month parts like so:

df1.index.map(lambda x: x.strftime('%Y-%m'))
df2.index.map(lambda x: x.strftime('%Y-%m'))

But I wonder how I can combine all these to achieve desired result?

Many thanks

Upvotes: 5

Views: 957

Answers (3)

Mario Pellegrini
Mario Pellegrini

Reputation: 51

df1.join(df2, how='left').fillna(method = "ffill")

Upvotes: 0

piRSquared
piRSquared

Reputation: 294218

The information you want to merge on isn't explicitly defined anywhere. And there isn't a nice to way to keep your dates in the index when we merge without destroying it. So, we move the indices to the dataframe proper and create two new columns to merge on. Namely, year and month. I wrapped this part in a function to better see what's happening where.

def f(df):
    df = df.reset_index()
    return df.assign(year=df.dat.dt.year, month=df.dat.dt.month)

df = f(df1).merge(f(df2), on=['year', 'month'], suffixes=['', '_'])

df.set_index('dat')[['x', 'y']]

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

This is a different concept using pd.Index.map and to_period. Create a dictionary mapping from df2 that translates the year/month period object to the corresponding value in column y. Then use map to map the period-ized dates in df1.index to the correct y values.

m = dict(zip(df2.index.to_period('M'), df2.y))
df1.assign(y=df1.index.to_period('M').map(m.get))

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

Setup

dates1 = ['2016-01-01', '2016-01-02', '2016-01-03',
          '2017-01-01', '2017-01-02', '2017-01-03']
df1 = pd.DataFrame({'x': range(1, 7)}, pd.DatetimeIndex(dates1, name='dat'))

dates2 = ['2016-01-01', '2017-01-01']
df2 = pd.DataFrame({'y': [10, 11]}, pd.DatetimeIndex(dates2, name='dat'))

Upvotes: 5

Scott Boston
Scott Boston

Reputation: 153460

You could use merge with assign on year and month from DateTimeIndex:

df3 = (df1.assign(year=df1.index.year, month=df1.index.month)
      .merge(df2.assign(year=df2.index.year, month=df2.index.month), on =['year','month'],right_index=True)
      .drop(['year','month'],axis=1))

Output:

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

Upvotes: 3

Related Questions