Generate permutation of a Pandas Dataframe

Question

This is a question about finding a better way to do something with pandas, more pythonic and more efficient. I am having trouble finding a optimal way to generate permutations of my data based on some condition.

Here is my input:

    Product     Stat1   Stat2   Stat3   Stat4   Stat5
1     A           3       1       2       4       3
2     B           3       4       3       2       1
3     C           3       4       2       3       1
4     D           4       2       4       2       4
5     E           5       4       1       1       2

And this is my desired output:

    Stat1   Stat2   Stat3   Stat4   Stat5
0     D       B       D       A       D
1     D       C       D       A       D
2     D       E       D       A       D
3     E       B       D       A       D
4     E       C       D       A       D
5     E       E       D       A       D

This code is to help you reproduce my input:

data = {
  'Product': {1: 'A', 2: 'B', 3: 'C', 4: 'D', 5: 'E'},
  'Stat1': {1: 3, 2: 3, 3: 3, 4: 4, 5: 5},
  'Stat2': {1: 1, 2: 4, 3: 4, 4: 2, 5: 4},
  'Stat3': {1: 2, 2: 3, 3: 2, 4: 4, 5: 1},
  'Stat4': {1: 4, 2: 2, 3: 3, 4: 2, 5: 1},
  'Stat5': {1: 3, 2: 1, 3: 1, 4: 4, 5: 2},
 }
df = pd.DataFrame.from_dict(data)

And this is how I generated my output:

clms = ['Stat1', 'Stat2', 'Stat3', 'Stat4', 'Stat5']
threshold = 3
permutation_df = pd.DataFrame(columns=clms)
for stat1 in df[df.Stat1 > threshold].Product.values:
  for stat2 in df[df.Stat2 > threshold].Product.values:
    for stat3 in df[df.Stat3 > threshold].Product.values:
      for stat4 in df[df.Stat4 > threshold].Product.values:
        for stat5 in df[df.Stat5 > threshold].Product.values:
          permutation_df = permutation_df.append(
              pd.Series([stat1, stat2, stat3, stat4, stat5,], 
              index=clms), 
              ignore_index=True
            )

This is sort of a brute force method, but I don't know any better as of now. Please share your knowledge on this :)

jezrael · Accepted Answer

Idea is create dictionary of lists by only values of Product in dict comprehension and then pass to itertools.product with DataFrame constructor:

from  itertools import product

d = {k: v.index[v > threshold].tolist() for k, v in df.set_index('Product')[clms].items()}
print (d)
{'Stat1': ['D', 'E'], 
 'Stat2': ['B', 'C', 'E'], 
 'Stat3': ['D'], 
 'Stat4': ['A'], 
 'Stat5': ['D']}

df1 = pd.DataFrame(product(*d.values()), columns=d.keys())
print (df1)
  Stat1 Stat2 Stat3 Stat4 Stat5
0     D     B     D     A     D
1     D     C     D     A     D
2     D     E     D     A     D
3     E     B     D     A     D
4     E     C     D     A     D
5     E     E     D     A     D

EDIT:

from  itertools import product
d = {k: v[v > threshold] for k, v in df.set_index('Product')[clms].items()}
print (d)
{'Stat1': Product
D    4
E    5
Name: Stat1, dtype: int64, 'Stat2': Product
B    4
C    4
E    4
Name: Stat2, dtype: int64, 'Stat3': Product
D    4
Name: Stat3, dtype: int64, 'Stat4': Product
A    4
Name: Stat4, dtype: int64, 'Stat5': Product
D    4
Name: Stat5, dtype: int64}

s = pd.DataFrame(product(*d.values())).sum(axis=1)
print (s)
0    20
1    20
2    20
3    21
4    21
5    21
dtype: int64

df2=pd.DataFrame(product(*[v.index for k,v in d.items()]), columns=d.keys()).assign(sums=s)
print (df2)
  Stat1 Stat2 Stat3 Stat4 Stat5  sums
0     D     B     D     A     D    20
1     D     C     D     A     D    20
2     D     E     D     A     D    20
3     E     B     D     A     D    21
4     E     C     D     A     D    21
5     E     E     D     A     D    21

Generate permutation of a Pandas Dataframe

Answers (1)

Related Questions