Jess BR
Jess BR

Reputation: 31

Pandas dataframe: how to permute rows and create new groups of combinations

I have the following pandas dataframe df with 10 rows and 4 columns that attributes 3 categorical variables:

df = pd.DataFrame(np.random.choice(["dog", "cat", "mice"], size=(10, 4)))

I would to know all permutations possible between the rows and create a new dataframe containing different groupings of the row combinations such as a group containing twice the same variable in the same row as cat cat dog mice or 4 of the same pig pig pig pig etc. I have tried with Itertools without success. Someone to help with some indications? Thanks

Upvotes: 3

Views: 327

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195528

I hope I've understood your question right. This example will create Series where index is the combination and values are size of this combination:

from collections import Counter
from itertools import permutations

print(
    df.assign(
        items=df.apply(
            lambda x: [
                frozenset(Counter(p).items()) for p in permutations(x, len(x))
            ],
            axis=1,
        )
    )
    .explode("items")
    .groupby("items")
    .size()
)

Prints (for example):

items
((mice, 2), (dog, 2))              48
((cat, 1), (dog, 2), (mice, 1))    48
((cat, 3), (mice, 1))              24
((mice, 3), (cat, 1))              24
((dog, 1), (mice, 3))              48
((dog, 1), (cat, 2), (mice, 1))    24
((mice, 4))                        24
dtype: int64

EDIT: To get a dataframe:

x = (
    df.assign(
        items=df.apply(
            lambda x: [
                frozenset(Counter(p).items()) for p in permutations(x, len(x))
            ],
            axis=1,
        )
    )
    .explode("items")
    .groupby("items")
    .size()
)
df_out = (
    pd.DataFrame([dict(i, count=v) for i, v in zip(x.index, x)])
    .fillna(0)
    .astype(int)
)
print(df_out)

Prints:

   dog  mice  cat  count
0    1     1    2     24
1    2     2    0     72
2    2     1    1     24
3    0     2    2     48
4    4     0    0     24
5    0     3    1     24
6    1     3    0     24

Upvotes: 1

Related Questions