Inner join of dataframes based on datetime

Question

I have two dataframes df1 and df2.

df1.index
DatetimeIndex(['2001-09-06', '2002-08-04', '2000-01-22', '2000-12-19',
               '2008-02-09', '2010-07-07', '2011-06-04', '2007-03-14',
               '2003-05-17', '2016-02-27',..dtype='datetime64[ns]', name=u'DateTime', length=6131, freq=None)

df2.index
DatetimeIndex(['2002-01-01 01:00:00', '2002-01-01 10:00:00',
               '2002-01-01 11:00:00', '2002-01-01 12:00:00',
               '2002-01-01 13:00:00', '2002-01-01 14:00:00',..dtype='datetime64[ns]', length=129273, freq=None)

i.e. df1 has index as days and df2 has index as datetime. I want to perform inner join of df1 and df2 on indexes such that if dates corresponding to hours in df2 is available in df1 we consider the inner join as true else false.

I want to obtain two df11 and df22 as output. df11 will have common dates and corresponding columns from df1. df22 will have common date-hours and corresponding columns from df2.

E.g. '2002-08-04' in df1 and '2002-08-04 01:00:00' in df2 is considered present in both.

If however '1802-08-04' in df1 has no hour in df2, it is not present in df11.

If however '2045-08-04 01:00:00' in df2 has no date in df1, it is not present in df22.

Right now I am using numpy in1d and pandas normalize functions to achieve this task in a lengthy manner. I was looking for pythonic way to achieve this.

Nickil Maveli · Accepted Answer

Consider a dummy DF constructed as shown:

idx1 = pd.date_range(start='2000/1/1', periods=100, freq='12D')
idx2 = pd.date_range(start='2000/1/1', periods=100, freq='300H')
np.random.seed([42, 314])

DF containing DateTimeIndex as only date attribute:

df1 = pd.DataFrame(np.random.randint(0,10,(100,2)), idx1)
df1.head()

DF containing DateTimeIndex as date + time attribute:

df2 = pd.DataFrame(np.random.randint(0,10,(100,2)), idx2)
df2.head()

Get common index considering only matching dates as the distinguishing parameter.

intersect = pd.Index(df2.index.date).intersection(df1.index)

First common index DF containing columns of it's original dataframe :

df11 = df1.loc[intersect]
df11

Second common index DF containing columns of it's original dataframe:

df22 = df2.iloc[np.where(df2.index.date.reshape(-1,1) == intersect.values)[0]]
df22

Inner join of dataframes based on datetime

Answers (1)

Related Questions