Reputation: 31
I have the following pandas dataframe df with 10 rows and 4 columns that attributes 3 categorical variables:
df = pd.DataFrame(np.random.choice(["dog", "cat", "mice"], size=(10, 4)))
I would to know all permutations possible between the rows and create a new dataframe containing different groupings of the row combinations such as a group containing twice the same variable in the same row as cat cat dog mice or 4 of the same pig pig pig pig etc. I have tried with Itertools without success. Someone to help with some indications? Thanks
Upvotes: 3
Views: 327
Reputation: 195528
I hope I've understood your question right. This example will create Series where index is the combination and values are size of this combination:
from collections import Counter
from itertools import permutations
print(
df.assign(
items=df.apply(
lambda x: [
frozenset(Counter(p).items()) for p in permutations(x, len(x))
],
axis=1,
)
)
.explode("items")
.groupby("items")
.size()
)
Prints (for example):
items
((mice, 2), (dog, 2)) 48
((cat, 1), (dog, 2), (mice, 1)) 48
((cat, 3), (mice, 1)) 24
((mice, 3), (cat, 1)) 24
((dog, 1), (mice, 3)) 48
((dog, 1), (cat, 2), (mice, 1)) 24
((mice, 4)) 24
dtype: int64
EDIT: To get a dataframe:
x = (
df.assign(
items=df.apply(
lambda x: [
frozenset(Counter(p).items()) for p in permutations(x, len(x))
],
axis=1,
)
)
.explode("items")
.groupby("items")
.size()
)
df_out = (
pd.DataFrame([dict(i, count=v) for i, v in zip(x.index, x)])
.fillna(0)
.astype(int)
)
print(df_out)
Prints:
dog mice cat count
0 1 1 2 24
1 2 2 0 72
2 2 1 1 24
3 0 2 2 48
4 4 0 0 24
5 0 3 1 24
6 1 3 0 24
Upvotes: 1