Patthebug
Patthebug

Reputation: 4787

Pythonic way to create pairs of values in a column in dataframe

I have a Dataframe that looks like this:

OwnerID    Value
1            A
1            B
1            C
1            D

This is the shortened version, I have thousands of values for OwnerID. I'd like to create pairs for the Value column where each Value is paired with every other Value, and have the result as list of pairs.

For example, for the OwnerID 1, the resultset should be the following lists:

[A,B]

[A,C]

[A,D]

[B,C]

[B,D]

[C,D]

I could write 2 for loops to achieve this, but that wouldn't be very efficient or pythonic. Would someone know a better way to achieve this?

Any help would be much appreciated.

Upvotes: 4

Views: 2760

Answers (5)

Vaishali
Vaishali

Reputation: 38415

import itertools as iter
df2 = df.groupby('OwnerID').Value.apply(lambda x: list(iter.combinations(x, 2)))

will return the desired output for each unique owner id

OwnerID
1    [(A, B), (A, C), (A, D), (B, C), (B, D), (C, D)]

Upvotes: 4

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Pandas solution (using .merge() and .query() methods):

Data:

In [10]: df
Out[10]:
   OwnerID Value
0        1     A
1        1     B
2        1     C
3        1     D
4        2     X
5        2     Y
6        2     Z

Solution:

In [9]: pd.merge(df, df, on='OwnerID', suffixes=['','2']).query("Value != Value2")
Out[9]:
    OwnerID Value Value2
1         1     A      B
2         1     A      C
3         1     A      D
4         1     B      A
6         1     B      C
7         1     B      D
8         1     C      A
9         1     C      B
11        1     C      D
12        1     D      A
13        1     D      B
14        1     D      C
17        2     X      Y
18        2     X      Z
19        2     Y      X
21        2     Y      Z
22        2     Z      X
23        2     Z      Y

If you need only lists:

In [17]: pd.merge(df, df, on='OwnerID', suffixes=['','2']) \
           .query("Value != Value2") \
           .filter(like='Value').values
Out[17]:
array([['A', 'B'],
       ['A', 'C'],
       ['A', 'D'],
       ['B', 'A'],
       ['B', 'C'],
       ['B', 'D'],
       ['C', 'A'],
       ['C', 'B'],
       ['C', 'D'],
       ['D', 'A'],
       ['D', 'B'],
       ['D', 'C'],
       ['X', 'Y'],
       ['X', 'Z'],
       ['Y', 'X'],
       ['Y', 'Z'],
       ['Z', 'X'],
       ['Z', 'Y']], dtype=object)

Upvotes: 6

AsheKetchum
AsheKetchum

Reputation: 1108

val = df['Value'].values
length = len(val)
pairs = [[val[i],val[j]] for i in xrange(length) for j in xrange(length) if i!=j]

Upvotes: 0

kyjanond
kyjanond

Reputation: 458

try itertools

import itertools

list(itertools.combinations(['a','b','c','d'], 2))

#result: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]

Upvotes: 0

B. Eckles
B. Eckles

Reputation: 1644

itertools is all you need.

Depending on how you want to combine them, try either permutations or combinations, for example.

Upvotes: 1

Related Questions