RVA92
RVA92

Reputation: 736

Finding intersection of pandas data frame index in groupby

I am using Python and have a data frame with a datetime index, a grouping variable (gvar) and a value variable (x). I would like to find all the common datetimes between the groups.

I already have a solution using functools, but I am seeking a way to do it using pandas functionalities only (if possible).

import functools
import pandas as pd

gvar =  ['A', 'A', 'A', 'B', 'B', 'B']
x = [100, 200, 100, 200 , 100, 200]
ind = ['2018-01-01','2018-01-02', '2018-01-03', '2018-01-03', '2018-01-04', '2018-01-05' ]
df = pd.DataFrame(data={'gvar':gvar, 'x': x},  index=pd.to_datetime(ind))

common_time = functools.reduce(lambda x, y: pd.np.intersect1d(x, y), [df[df.gvar == x].index
                                       for x in set(df.gvar)])

common_time
Out[39]: array(['2018-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

All suggestions are welcome.

Upvotes: 0

Views: 308

Answers (2)

ansev
ansev

Reputation: 30920

Use groupby.transform:

df_filtered=df[df.groupby(level=0)['gvar'].transform('size').ge(2)]
print(df_filtered)

           gvar    x
2018-01-03    A  100
2018-01-03    B  200

df_filtered.index.unique()
#DatetimeIndex(['2018-01-03'], dtype='datetime64[ns]', freq=None)

Upvotes: 1

Adam Zeldin
Adam Zeldin

Reputation: 908

This should do it:

>>> df.reset_index().loc[df['gvar'].reset_index().drop_duplicates().duplicated('index'),'index'].tolist()

Returning:

[Timestamp('2018-01-03 00:00:00')]

And if you need the corresponding groups or values:

>>>df[df.index.isin(df.reset_index().loc[df['gvar'].reset_index().drop_duplicates().duplicated('index'),'index'].tolist())]

Giving you:

           gvar    x
2018-01-03    A  100
2018-01-03    B  200

Upvotes: 1

Related Questions