Groupby multiple columns with product aggregation

Question

I have a pandas dataframe with several items and counts as follows:

index	item1	item2	item3	count1	count2	count3
1	0	0.5	0.5	10	15	0
2	0.5	0	0.5	20	20	20
3	1	0	0	30	10	30
4	0	1	0	20	20	0

I want to groupby the data by items, aggregating with a product so that I end up with a Dataframe that has the items as index, the counts as columns and the values at cell i, j are equal to the sum of item{i} * count{j}. For instance:

index	count1	count2
item1	(0 * 10) + (0.5 * 20) + (1 * 30) + (0 * 20) = 40	(0 * 15) + (0.5 * 20) + (1 * 10) + (0 * 20) = 20
item2	(0.5 * 10) + (0 * 20) + (0 * 30) + (1 * 20) = 25	(0.5 * 15) + (0 * 20) + (0 * 10) + (1 * 20) = 27.5

I have tried using groupby:

df[items + counts].groupby(items).agg('prod')

and

df.groupby(items)[counts].agg('prod')

but the problem is that groupby uses the values of the column rather than the column themselves, and I'm running into the same issue with pivot_table:

df.pivot_table(index=items, values=counts, aggfunc='prod')

I feel like the solution should be trivial, but I can't quite put the finger on what I'm missing.

Dani Mesejo · Accepted Answer

IIUC, you could use dot of items vs counts:

# create DataFrame with only item columns
items = df.filter(regex='^item')

# create DataFrame with only count columns
counts = df.filter(regex='^count')

# compute dot product
res = items.T.dot(counts)

print(res)

Output

       count1  count2  count3
item1    40.0    20.0    40.0
item2    25.0    27.5     0.0
item3    15.0    17.5    10.0

The items and counts DataFrames are obtained using filter.

Groupby multiple columns with product aggregation

Answers (1)

Related Questions