Reputation: 662
This is a question about finding a better way to do something with pandas, more pythonic and more efficient. I am having trouble finding a optimal way to generate permutations of my data based on some condition.
Here is my input:
Product Stat1 Stat2 Stat3 Stat4 Stat5
1 A 3 1 2 4 3
2 B 3 4 3 2 1
3 C 3 4 2 3 1
4 D 4 2 4 2 4
5 E 5 4 1 1 2
And this is my desired output:
Stat1 Stat2 Stat3 Stat4 Stat5
0 D B D A D
1 D C D A D
2 D E D A D
3 E B D A D
4 E C D A D
5 E E D A D
This code is to help you reproduce my input:
data = {
'Product': {1: 'A', 2: 'B', 3: 'C', 4: 'D', 5: 'E'},
'Stat1': {1: 3, 2: 3, 3: 3, 4: 4, 5: 5},
'Stat2': {1: 1, 2: 4, 3: 4, 4: 2, 5: 4},
'Stat3': {1: 2, 2: 3, 3: 2, 4: 4, 5: 1},
'Stat4': {1: 4, 2: 2, 3: 3, 4: 2, 5: 1},
'Stat5': {1: 3, 2: 1, 3: 1, 4: 4, 5: 2},
}
df = pd.DataFrame.from_dict(data)
And this is how I generated my output:
clms = ['Stat1', 'Stat2', 'Stat3', 'Stat4', 'Stat5']
threshold = 3
permutation_df = pd.DataFrame(columns=clms)
for stat1 in df[df.Stat1 > threshold].Product.values:
for stat2 in df[df.Stat2 > threshold].Product.values:
for stat3 in df[df.Stat3 > threshold].Product.values:
for stat4 in df[df.Stat4 > threshold].Product.values:
for stat5 in df[df.Stat5 > threshold].Product.values:
permutation_df = permutation_df.append(
pd.Series([stat1, stat2, stat3, stat4, stat5,],
index=clms),
ignore_index=True
)
This is sort of a brute force method, but I don't know any better as of now. Please share your knowledge on this :)
Upvotes: 2
Views: 721
Reputation: 863031
Idea is create dictionary of lists by only values of Product
in dict comprehension and then pass to itertools.product
with DataFrame
constructor:
from itertools import product
d = {k: v.index[v > threshold].tolist() for k, v in df.set_index('Product')[clms].items()}
print (d)
{'Stat1': ['D', 'E'],
'Stat2': ['B', 'C', 'E'],
'Stat3': ['D'],
'Stat4': ['A'],
'Stat5': ['D']}
df1 = pd.DataFrame(product(*d.values()), columns=d.keys())
print (df1)
Stat1 Stat2 Stat3 Stat4 Stat5
0 D B D A D
1 D C D A D
2 D E D A D
3 E B D A D
4 E C D A D
5 E E D A D
EDIT:
from itertools import product
d = {k: v[v > threshold] for k, v in df.set_index('Product')[clms].items()}
print (d)
{'Stat1': Product
D 4
E 5
Name: Stat1, dtype: int64, 'Stat2': Product
B 4
C 4
E 4
Name: Stat2, dtype: int64, 'Stat3': Product
D 4
Name: Stat3, dtype: int64, 'Stat4': Product
A 4
Name: Stat4, dtype: int64, 'Stat5': Product
D 4
Name: Stat5, dtype: int64}
s = pd.DataFrame(product(*d.values())).sum(axis=1)
print (s)
0 20
1 20
2 20
3 21
4 21
5 21
dtype: int64
df2=pd.DataFrame(product(*[v.index for k,v in d.items()]), columns=d.keys()).assign(sums=s)
print (df2)
Stat1 Stat2 Stat3 Stat4 Stat5 sums
0 D B D A D 20
1 D C D A D 20
2 D E D A D 20
3 E B D A D 21
4 E C D A D 21
5 E E D A D 21
Upvotes: 1