Reputation: 4807
I have two dataframes df1 and df2.
df1.index
DatetimeIndex(['2001-09-06', '2002-08-04', '2000-01-22', '2000-12-19',
'2008-02-09', '2010-07-07', '2011-06-04', '2007-03-14',
'2003-05-17', '2016-02-27',..dtype='datetime64[ns]', name=u'DateTime', length=6131, freq=None)
df2.index
DatetimeIndex(['2002-01-01 01:00:00', '2002-01-01 10:00:00',
'2002-01-01 11:00:00', '2002-01-01 12:00:00',
'2002-01-01 13:00:00', '2002-01-01 14:00:00',..dtype='datetime64[ns]', length=129273, freq=None)
i.e. df1 has index as days and df2 has index as datetime. I want to perform inner join of df1 and df2 on indexes such that if dates corresponding to hours in df2 is available in df1 we consider the inner join as true else false.
I want to obtain two df11 and df22 as output. df11 will have common dates and corresponding columns from df1. df22 will have common date-hours and corresponding columns from df2.
E.g. '2002-08-04' in df1 and '2002-08-04 01:00:00' in df2 is considered present in both.
If however '1802-08-04' in df1 has no hour in df2, it is not present in df11.
If however '2045-08-04 01:00:00' in df2 has no date in df1, it is not present in df22.
Right now I am using numpy in1d
and pandas normalize
functions to achieve this task in a lengthy manner. I was looking for pythonic way to achieve this.
Upvotes: 2
Views: 787
Reputation: 29711
Consider a dummy DF
constructed as shown:
idx1 = pd.date_range(start='2000/1/1', periods=100, freq='12D')
idx2 = pd.date_range(start='2000/1/1', periods=100, freq='300H')
np.random.seed([42, 314])
DF
containing DateTimeIndex
as only date attribute:
df1 = pd.DataFrame(np.random.randint(0,10,(100,2)), idx1)
df1.head()
DF
containing DateTimeIndex
as date + time attribute:
df2 = pd.DataFrame(np.random.randint(0,10,(100,2)), idx2)
df2.head()
Get common index considering only matching dates as the distinguishing parameter.
intersect = pd.Index(df2.index.date).intersection(df1.index)
First common index DF
containing columns of it's original dataframe :
df11 = df1.loc[intersect]
df11
Second common index DF
containing columns of it's original dataframe:
df22 = df2.iloc[np.where(df2.index.date.reshape(-1,1) == intersect.values)[0]]
df22
Upvotes: 2