How do I find the most common combinations of items on a grouped dataframe?

Question

I have a dataframe like this:

pd.DataFrame([{"order_id": 1234, "product": "milk"},
              {"order_id": 1234, "product": "butter"},
             {"order_id": 4321, "product": "bread"}, 
             {"order_id": 4321, "product": "milk"},
             {"order_id": 4321, "product": "butter"},
             {"order_id": 1111, "product": "corn"},
             {"order_id": 1111, "product": "cereal"},
             {"order_id": 8888, "product": "milk"}])

    order_id    product
0   1234    milk
1   1234    butter
2   4321    bread
3   4321    milk
4   4321    butter
5   1111    corn
6   1111    cereal
7   8888    milk

I need to find the most common combination of products, without having to infer how many products to put in these combinations.

This example shoud return milk and butter as the two most bought together itens.

I have tried grouping them by the order_id but could not find a solution to get the combinations inside the groups.

Quang Hoang · Accepted Answer

We can do find pairs of products by merge and groupby.size:

# merge on id to pair up the products
new_df = df.merge(df, on='order_id')

# first thing is to drop identical products
(new_df[new_df['product_x'].lt(new_df['product_y'])]
    .groupby(['order_id', 'product_x', 'product_y'])              # group
    .size()            # count (id, prod1, prod2)
    .sum(level=[1,2])  # sum over (prod1, prod2)
    .idxmax()          # get (prod1, prod2) with max count
)

gives you

('butter', 'milk')

How do I find the most common combinations of items on a grouped dataframe?

Answers (2)

`itertools.combinations` and `pandas.Series.mode`

`collections.Counter`

Related Questions

How do I find the most common combinations of items on a grouped dataframe?

Answers (2)

itertools.combinations and pandas.Series.mode

collections.Counter

Related Questions

`itertools.combinations` and `pandas.Series.mode`

`collections.Counter`