Reputation: 1809
If I have a DataFrame like this (very minimal example)
col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 b 4
5 c 1
6 c 2
7 c 3
and I want the intersection of all col2
values when they are related to their unique col1
values (so in this case, the intersection would be [1,2]
), how can I do so with Pandas? Another way to word this would be the values in col2
that exist for every unique value in col1
.
My (bad) solution was to get the unique col1
elements with unique
, and then build dictionaries from each unique element in col1
and then take the set intersection of those dictionary values. I feel like there is a mechanism I should be using to relate the columns together however that could make this much easier.
Upvotes: 2
Views: 190
Reputation: 862521
Another solution:
print df.pivot_table(index="col1", columns="col2", aggfunc=len)
col2 1 2 3 4
col1
a 1.0 1.0 NaN NaN
b 1.0 1.0 NaN 1.0
c 1.0 1.0 1.0 NaN
ser = (df.pivot_table(index="col1", columns="col2", aggfunc=len) == 1).all()
print ser.index[ser]
Int64Index([1, 2], dtype='int64', name=u'col2')
Upvotes: 1
Reputation: 375435
One way is to use pivot_table
:
In [11]: cross = df.pivot_table(index="col1", columns="col2", aggfunc='size') == 1
In [12]: cross
Out[12]:
col2 1 2 3 4
col1
a True True False False
b True True False True
c True True True False
In [13]: cross.all()
Out[13]:
col2
1 True
2 True
3 False
4 False
dtype: bool
In [14]: cross.columns[cross.all()]
Out[14]: Int64Index([1, 2], dtype='int64', name='col2')
Upvotes: 1