ComplicatedPhenomenon
ComplicatedPhenomenon

Reputation: 4199

select the rows by intersection of a column

I have a DataFrame like

In [67]: df                                                                                                         
Out[67]: 
   id         ts
0   a 2018-01-01
1   a 2018-01-02
2   a 2018-01-03
3   a 2018-01-04
4   a 2018-01-05
5   a 2018-01-06
6   a 2018-01-07
7   a 2018-01-08
8   b 2018-01-03
9   b 2018-01-04
10  b 2018-01-05
11  b 2018-01-06
12  b 2018-01-07
13  b 2018-01-08
14  b 2018-01-09
15  b 2018-01-10
16  b 2018-01-11

How can I extract the part where a and b has a same ts?

   id         ts
2   a 2018-01-03
3   a 2018-01-04
4   a 2018-01-05
5   a 2018-01-06
6   a 2018-01-07
7   a 2018-01-08
8   b 2018-01-03
9   b 2018-01-04
10  b 2018-01-05
11  b 2018-01-06
12  b 2018-01-07
13  b 2018-01-08

There might be many unique id beside a and b. I want all the intersection of column ts.

what would be the expected output with an additional row of c 2018-01-04?

It would be

a 2018-01-04
b 2018-01-04
c 2018-01-04

Upvotes: 0

Views: 546

Answers (2)

jezrael
jezrael

Reputation: 863401

Idea is reshape by DataFrame.pivot_table, so get missing values for different datetimes, remove them by DataFrame.dropna and then filter original DataFrame by Series.isin:

df1 = df.pivot_table(index='ts', columns='id', aggfunc='size').dropna()

df = df[df['ts'].isin(df1.index)]
print (df)

   id          ts
2   a  2018-01-03
3   a  2018-01-04
4   a  2018-01-05
5   a  2018-01-06
6   a  2018-01-07
7   a  2018-01-08
8   b  2018-01-03
9   b  2018-01-04
10  b  2018-01-05
11  b  2018-01-06
12  b  2018-01-07
13  b  2018-01-08

Test if added new c row:

df1 = df.pivot_table(index='ts', columns='id', aggfunc='size').dropna()

df = df[df['ts'].isin(df1.index)]
print (df)
   id          ts
3   a  2018-01-04
9   b  2018-01-04
17  c  2018-01-04

Upvotes: 2

yatu
yatu

Reputation: 88295

To keep only the intersecting values, you could take the groupby.size of ts, and check the which of these groups have a size equal to the amount of unique values in ts. Then use the result to index the dataframe.

Checking on the proposed dataframe, and the additional row c 2018-01-04, this would return only the intersecting dates in ts:

s = df.groupby(df.ts).size().eq(df.id.nunique())
df[df.ts.isin(s[s].index)]


   id          ts
3   a  2018-01-04
9   b  2018-01-04
16  c  2018-01-04

Upvotes: 1

Related Questions