user2969402
user2969402

Reputation: 1251

Groupby multiple columns with product aggregation

I have a pandas dataframe with several items and counts as follows:

index item1 item2 item3 count1 count2 count3
1 0 0.5 0.5 10 15 0
2 0.5 0 0.5 20 20 20
3 1 0 0 30 10 30
4 0 1 0 20 20 0

I want to groupby the data by items, aggregating with a product so that I end up with a Dataframe that has the items as index, the counts as columns and the values at cell i, j are equal to the sum of item{i} * count{j}. For instance:

index count1 count2
item1 (0 * 10) + (0.5 * 20) + (1 * 30) + (0 * 20) = 40 (0 * 15) + (0.5 * 20) + (1 * 10) + (0 * 20) = 20
item2 (0.5 * 10) + (0 * 20) + (0 * 30) + (1 * 20) = 25 (0.5 * 15) + (0 * 20) + (0 * 10) + (1 * 20) = 27.5

I have tried using groupby:

df[items + counts].groupby(items).agg('prod')

and

df.groupby(items)[counts].agg('prod')

but the problem is that groupby uses the values of the column rather than the column themselves, and I'm running into the same issue with pivot_table:

df.pivot_table(index=items, values=counts, aggfunc='prod')

I feel like the solution should be trivial, but I can't quite put the finger on what I'm missing.

Upvotes: 4

Views: 212

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61910

IIUC, you could use dot of items vs counts:

# create DataFrame with only item columns
items = df.filter(regex='^item')

# create DataFrame with only count columns
counts = df.filter(regex='^count')

# compute dot product
res = items.T.dot(counts)

print(res)

Output

       count1  count2  count3
item1    40.0    20.0    40.0
item2    25.0    27.5     0.0
item3    15.0    17.5    10.0

The items and counts DataFrames are obtained using filter.

Upvotes: 3

Related Questions