Zanam
Zanam

Reputation: 4807

Inner join of dataframes based on datetime

I have two dataframes df1 and df2.

df1.index
DatetimeIndex(['2001-09-06', '2002-08-04', '2000-01-22', '2000-12-19',
               '2008-02-09', '2010-07-07', '2011-06-04', '2007-03-14',
               '2003-05-17', '2016-02-27',..dtype='datetime64[ns]', name=u'DateTime', length=6131, freq=None)

df2.index
DatetimeIndex(['2002-01-01 01:00:00', '2002-01-01 10:00:00',
               '2002-01-01 11:00:00', '2002-01-01 12:00:00',
               '2002-01-01 13:00:00', '2002-01-01 14:00:00',..dtype='datetime64[ns]', length=129273, freq=None)

i.e. df1 has index as days and df2 has index as datetime. I want to perform inner join of df1 and df2 on indexes such that if dates corresponding to hours in df2 is available in df1 we consider the inner join as true else false.

I want to obtain two df11 and df22 as output. df11 will have common dates and corresponding columns from df1. df22 will have common date-hours and corresponding columns from df2.

E.g. '2002-08-04' in df1 and '2002-08-04 01:00:00' in df2 is considered present in both.

If however '1802-08-04' in df1 has no hour in df2, it is not present in df11.

If however '2045-08-04 01:00:00' in df2 has no date in df1, it is not present in df22.

Right now I am using numpy in1d and pandas normalize functions to achieve this task in a lengthy manner. I was looking for pythonic way to achieve this.

Upvotes: 2

Views: 787

Answers (1)

Nickil Maveli
Nickil Maveli

Reputation: 29711

Consider a dummy DF constructed as shown:

idx1 = pd.date_range(start='2000/1/1', periods=100, freq='12D')
idx2 = pd.date_range(start='2000/1/1', periods=100, freq='300H')
np.random.seed([42, 314])

DF containing DateTimeIndex as only date attribute:

df1 = pd.DataFrame(np.random.randint(0,10,(100,2)), idx1)
df1.head()

enter image description here

DF containing DateTimeIndex as date + time attribute:

df2 = pd.DataFrame(np.random.randint(0,10,(100,2)), idx2)
df2.head()

enter image description here

Get common index considering only matching dates as the distinguishing parameter.

intersect = pd.Index(df2.index.date).intersection(df1.index)

First common index DF containing columns of it's original dataframe :

df11 = df1.loc[intersect]
df11

enter image description here

Second common index DF containing columns of it's original dataframe:

df22 = df2.iloc[np.where(df2.index.date.reshape(-1,1) == intersect.values)[0]]
df22

enter image description here

Upvotes: 2

Related Questions