Reputation: 344
I'm having the following table:
productName | feature | probability_for _feature |
---|---|---|
A | w | 0.2 |
A | z | 0.8 |
B | w | 0.2 |
B | z | 0.8 |
B | x | 0.3 |
I want to get for each productName the product of probability for it's feature meaning productName A have both feature w with probability 0.2 and feature z with probability 0.8 hence the product for feature A will bee 0.2*0.8= 0.16. Thus , in the table above I will get:
productName | features | probability_for _feature |
---|---|---|
A | [w,z] | 0.16 |
B | [w,z,x] | 0.048 |
Or just :
productName | probability |
---|---|
A | 0.16 |
B | 0.048 |
I didn't find any product or multiply function as aggregation function and would like to get some help . Thank you !
Upvotes: 1
Views: 453
Reputation: 291
For aggregation with any function you can use the scan operator.
scan operator example cumulative sum
Example with multiply:
datatable (productName: string, feature: string, probability_for_feature: double)
[
'A', 'w', 0.2,
'A', 'z', 0.8,
'B', 'w', 0.2,
'B', 'z', 0.8,
'B', 'x', 0.3,
]
| sort by productName asc
| partition by productName
(
// for every productName scan all rows
scan declare (probability: double= 1.0) with
(
// multiply probability for every row and return last result
step s1 output=last: true => probability = probability_for_feature * s1.probability;
)
)
| project productName, probability
Upvotes: 1
Reputation: 25895
Here's an option, using a cumulative sum and leveraging the fact that log(x1) + log(x2) … + log(xN) == log(x1 * x2 * … * xN)
:
datatable(productName:string, feature:string, probability_for_feature:double)
[
'A', 'w', 0.2,
'A', 'z', 0.8,
'B', 'w', 0.2,
'B', 'z', 0.8,
'B', 'x', 0.3,
]
| order by productName asc
| extend l = log10(probability_for_feature), rn = row_number()
| extend cumsum = row_cumsum(l, productName != prev(productName))
| summarize arg_max(rn, *), features = make_list(feature) by productName
| project productName, features, product = exp10(cumsum)
productName | features | product |
---|---|---|
A | [ "w", "z" ] |
0.16 |
B | [ "w", "z", "x" ] |
0.048 |
Upvotes: 1