Eric Hansen
Eric Hansen

Reputation: 1809

Finding intersection of values in a column associated with unique values in another column Pandas

If I have a DataFrame like this (very minimal example)

  col1  col2
0    a     1
1    a     2
2    b     1
3    b     2
4    b     4
5    c     1
6    c     2
7    c     3

and I want the intersection of all col2 values when they are related to their unique col1 values (so in this case, the intersection would be [1,2]), how can I do so with Pandas? Another way to word this would be the values in col2 that exist for every unique value in col1.

My (bad) solution was to get the unique col1 elements with unique, and then build dictionaries from each unique element in col1 and then take the set intersection of those dictionary values. I feel like there is a mechanism I should be using to relate the columns together however that could make this much easier.

Upvotes: 2

Views: 190

Answers (2)

jezrael
jezrael

Reputation: 862521

Another solution:

print df.pivot_table(index="col1", columns="col2", aggfunc=len)
col2    1    2    3    4
col1                    
a     1.0  1.0  NaN  NaN
b     1.0  1.0  NaN  1.0
c     1.0  1.0  1.0  NaN

ser = (df.pivot_table(index="col1", columns="col2", aggfunc=len) == 1).all()
print  ser.index[ser]
Int64Index([1, 2], dtype='int64', name=u'col2')

Upvotes: 1

Andy Hayden
Andy Hayden

Reputation: 375435

One way is to use pivot_table:

In [11]: cross = df.pivot_table(index="col1", columns="col2", aggfunc='size') == 1

In [12]: cross
Out[12]:
col2     1     2      3      4
col1
a     True  True  False  False
b     True  True  False   True
c     True  True   True  False

In [13]: cross.all()
Out[13]:
col2
1     True
2     True
3    False
4    False
dtype: bool

In [14]: cross.columns[cross.all()]
Out[14]: Int64Index([1, 2], dtype='int64', name='col2')

Upvotes: 1

Related Questions