select the rows by intersection of a column

Question

I have a DataFrame like

In [67]: df                                                                                                         
Out[67]: 
   id         ts
0   a 2018-01-01
1   a 2018-01-02
2   a 2018-01-03
3   a 2018-01-04
4   a 2018-01-05
5   a 2018-01-06
6   a 2018-01-07
7   a 2018-01-08
8   b 2018-01-03
9   b 2018-01-04
10  b 2018-01-05
11  b 2018-01-06
12  b 2018-01-07
13  b 2018-01-08
14  b 2018-01-09
15  b 2018-01-10
16  b 2018-01-11

How can I extract the part where a and b has a same ts?

   id         ts
2   a 2018-01-03
3   a 2018-01-04
4   a 2018-01-05
5   a 2018-01-06
6   a 2018-01-07
7   a 2018-01-08
8   b 2018-01-03
9   b 2018-01-04
10  b 2018-01-05
11  b 2018-01-06
12  b 2018-01-07
13  b 2018-01-08

There might be many unique id beside a and b. I want all the intersection of column ts.

what would be the expected output with an additional row of c 2018-01-04?

It would be

a 2018-01-04
b 2018-01-04
c 2018-01-04

jezrael · Accepted Answer

Idea is reshape by DataFrame.pivot_table, so get missing values for different datetimes, remove them by DataFrame.dropna and then filter original DataFrame by Series.isin:

df1 = df.pivot_table(index='ts', columns='id', aggfunc='size').dropna()

df = df[df['ts'].isin(df1.index)]
print (df)

   id          ts
2   a  2018-01-03
3   a  2018-01-04
4   a  2018-01-05
5   a  2018-01-06
6   a  2018-01-07
7   a  2018-01-08
8   b  2018-01-03
9   b  2018-01-04
10  b  2018-01-05
11  b  2018-01-06
12  b  2018-01-07
13  b  2018-01-08

Test if added new c row:

df1 = df.pivot_table(index='ts', columns='id', aggfunc='size').dropna()

df = df[df['ts'].isin(df1.index)]
print (df)
   id          ts
3   a  2018-01-04
9   b  2018-01-04
17  c  2018-01-04

select the rows by intersection of a column

Answers (2)

Related Questions