Reputation: 4199
I have a DataFrame like
In [67]: df
Out[67]:
id ts
0 a 2018-01-01
1 a 2018-01-02
2 a 2018-01-03
3 a 2018-01-04
4 a 2018-01-05
5 a 2018-01-06
6 a 2018-01-07
7 a 2018-01-08
8 b 2018-01-03
9 b 2018-01-04
10 b 2018-01-05
11 b 2018-01-06
12 b 2018-01-07
13 b 2018-01-08
14 b 2018-01-09
15 b 2018-01-10
16 b 2018-01-11
How can I extract the part where a
and b
has a same ts
?
id ts
2 a 2018-01-03
3 a 2018-01-04
4 a 2018-01-05
5 a 2018-01-06
6 a 2018-01-07
7 a 2018-01-08
8 b 2018-01-03
9 b 2018-01-04
10 b 2018-01-05
11 b 2018-01-06
12 b 2018-01-07
13 b 2018-01-08
There might be many unique id
beside a
and b
. I want all the intersection of column ts
.
what would be the expected output with an additional row of
c 2018-01-04
?
It would be
a 2018-01-04
b 2018-01-04
c 2018-01-04
Upvotes: 0
Views: 546
Reputation: 863401
Idea is reshape by DataFrame.pivot_table
, so get missing values for different datetimes, remove them by DataFrame.dropna
and then filter original DataFrame by Series.isin
:
df1 = df.pivot_table(index='ts', columns='id', aggfunc='size').dropna()
df = df[df['ts'].isin(df1.index)]
print (df)
id ts
2 a 2018-01-03
3 a 2018-01-04
4 a 2018-01-05
5 a 2018-01-06
6 a 2018-01-07
7 a 2018-01-08
8 b 2018-01-03
9 b 2018-01-04
10 b 2018-01-05
11 b 2018-01-06
12 b 2018-01-07
13 b 2018-01-08
Test if added new c
row:
df1 = df.pivot_table(index='ts', columns='id', aggfunc='size').dropna()
df = df[df['ts'].isin(df1.index)]
print (df)
id ts
3 a 2018-01-04
9 b 2018-01-04
17 c 2018-01-04
Upvotes: 2
Reputation: 88295
To keep only the intersecting values, you could take the groupby.size
of ts
, and check the which of these groups have a size equal to the amount of unique values in ts
. Then use the result to index the dataframe.
Checking on the proposed dataframe, and the additional row c 2018-01-04
, this would return only the intersecting dates in ts
:
s = df.groupby(df.ts).size().eq(df.id.nunique())
df[df.ts.isin(s[s].index)]
id ts
3 a 2018-01-04
9 b 2018-01-04
16 c 2018-01-04
Upvotes: 1