Reputation: 1251
I have a pandas dataframe with several items and counts as follows:
index | item1 | item2 | item3 | count1 | count2 | count3 |
---|---|---|---|---|---|---|
1 | 0 | 0.5 | 0.5 | 10 | 15 | 0 |
2 | 0.5 | 0 | 0.5 | 20 | 20 | 20 |
3 | 1 | 0 | 0 | 30 | 10 | 30 |
4 | 0 | 1 | 0 | 20 | 20 | 0 |
I want to groupby
the data by items, aggregating with a product so that I end up with a Dataframe that has the items as index, the counts as columns and the values at cell i, j are equal to the sum of item{i} * count{j}
. For instance:
index | count1 | count2 |
---|---|---|
item1 | (0 * 10) + (0.5 * 20) + (1 * 30) + (0 * 20) = 40 | (0 * 15) + (0.5 * 20) + (1 * 10) + (0 * 20) = 20 |
item2 | (0.5 * 10) + (0 * 20) + (0 * 30) + (1 * 20) = 25 | (0.5 * 15) + (0 * 20) + (0 * 10) + (1 * 20) = 27.5 |
I have tried using groupby
:
df[items + counts].groupby(items).agg('prod')
and
df.groupby(items)[counts].agg('prod')
but the problem is that groupby
uses the values of the column rather than the column themselves, and I'm running into the same issue with pivot_table
:
df.pivot_table(index=items, values=counts, aggfunc='prod')
I feel like the solution should be trivial, but I can't quite put the finger on what I'm missing.
Upvotes: 4
Views: 212
Reputation: 61910
IIUC, you could use dot of items vs counts:
# create DataFrame with only item columns
items = df.filter(regex='^item')
# create DataFrame with only count columns
counts = df.filter(regex='^count')
# compute dot product
res = items.T.dot(counts)
print(res)
Output
count1 count2 count3
item1 40.0 20.0 40.0
item2 25.0 27.5 0.0
item3 15.0 17.5 10.0
The items and counts DataFrames are obtained using filter.
Upvotes: 3