Fernanda F.
Fernanda F.

Reputation: 47

How to compare all values in one column to each other

I need to compare each value from one column to each value from same column. I need to use permutation.

For example, I have the following dataframe:

name | value
  A  |  2
  B  |  3
  C  |  1
  D  |  5

And I need to compare every letter on the "name" column to each other like this:

name_1 | name_2 | value_1 | value_2
  A    |   B    |    2    |   3
  A    |   C    |    2    |   1
  A    |   D    |    2    |   5
  B    |   A    |    3    |   2
  B    |   C    |    3    |   1
  B    |   D    |    3    |   5
  C    |   A    |    1    |   2
  C    |   B    |    1    |   3
  C    |   D    |    1    |   5
  D    |   A    |    5    |   2
  D    |   B    |    5    |   3
  D    |   C    |    5    |   1

How can I do this? Is there a Pandas way to do this maybe?

Thanks in advance!

Upvotes: 3

Views: 130

Answers (3)

sammywemmy
sammywemmy

Reputation: 28644

some love from itertools module provides an alternative solution :

from itertools import permutations,chain
#step out of pandas 
m = df.to_numpy().tolist()
get the combination of all entries
step1 = permutations(m,2)
concat = chain.from_iterable
columns = ['name_1','value_1','name_2','value_2']
#get the data back into a dataframe
res = (pd.DataFrame(map(concat,step1), columns = columns)
       .sort_index(axis=1)
      )
res


 name_1 name_2  value_1 value_2
0   A     B        2      3
1   A     C        2      1
2   A     D        2      5
3   B     A        3      2
4   B     C        3      1
5   B     D        3      5
6   C     A        1      2
7   C     B        1      3
8   C     D        1      5
9   D     A        5      2
10  D     B        5      3
11  D     C        5      1

Upvotes: 0

ALollz
ALollz

Reputation: 59519

Here's a method with product

import pandas as pd
import numpy as np
from itertools import product, chain

data = np.array([*product(df.to_numpy(), df.to_numpy())]).reshape(df.shape[0]**2, -1)
columns = chain.from_iterable([df.columns + f'_{i}' for i in range(df.shape[1])])

pd.DataFrame(data=data, columns=columns).query('name_0 != name_1')

   name_0 value_0 name_1 value_1
1       A       2      B       3
2       A       2      C       1
3       A       2      D       5
4       B       3      A       2
6       B       3      C       1
7       B       3      D       5
8       C       1      A       2
9       C       1      B       3
11      C       1      D       5
12      D       5      A       2
13      D       5      B       3
14      D       5      C       1

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You can do a self-merge and query:

(df.assign(dummy=1)
   .merge(df.assign(dummy=1), on='dummy', suffixes=['_1','_2'])
   .query('name_1 != name_2')
   .drop('dummy', axis=1)
   .sort_index(axis=1)
)

Output:

   name_1 name_2  value_1  value_2
1       A      B        2        3
2       A      C        2        1
3       A      D        2        5
4       B      A        3        2
6       B      C        3        1
7       B      D        3        5
8       C      A        1        2
9       C      B        1        3
11      C      D        1        5
12      D      A        5        2
13      D      B        5        3
14      D      C        5        1

Upvotes: 5

Related Questions