bigO6377
bigO6377

Reputation: 1274

KDB marginal percentages

Consider the table T which when tabulated, looks like this:

select sum Qty by Flag1,Flag2 from T

Flag1 Flag2 Qty
`Sunny `Hot 20
`Sunny `Cold 40
`Rainy `Hot 60
`Rainy `Cold 80

1) Is there a nifty way of computing the overall fraction of aggregate Qty that falls in each bucket (0.1,0.2,0.3,0.4)? Obviously one could do something like

select Qty % sum Qty from select sum Qty by Flag1,Flag2 from T

but is there anyway of doing this in one swoop (i.e. with only one select/exec statment),as this could get kind of cumbersome if you're doing this for many variables?

2) Now imagine I want to create a column that has the relative fraction of Qty but marginalized by Flag1. How do I create the following table?

Flag1 Flag2 Qty FracByFlag1
`Sunny `Hot 20  0.333
`Sunny `Cold 40 0.667
`Rainy `Hot 60 0.429
`Rainy `Cold 80 0.571

Upvotes: 1

Views: 1669

Answers (3)

Charles Lin
Charles Lin

Reputation: 712

1) I think it is simplest to sum the quantity of the column separately during division:

update (sum Qty) % sum T[`Qty] by Flag1, Flag2 from T

2) fby is exactly what you are after:

update FracByFlag1: Qty % (sum;Qty) fby Flag1 from T

Upvotes: 1

JPC
JPC

Reputation: 1919

you might want to consider making this a functional form query

more info on functional form here

q)t:([]f1:raze 2#'`s`r; f2:4#`h`c; qty:20 40 60 80)
q)p:{![x;();y;enlist[`pct]!enlist (%;z;(sum;z))]}
q)p[t;0b;`qty]
f1 f2 qty pct
-------------
s  h  20  0.1
s  c  40  0.2
r  h  60  0.3
r  c  80  0.4
q)p[t;enlist[`f1]!enlist `f1;`qty]
f1 f2 qty pct      
-------------------
s  h  20  0.3333333
s  c  40  0.6666667
r  h  60  0.4285714
r  c  80  0.5714286

Upvotes: 1

terrylynch
terrylynch

Reputation: 13572

I think you have to weight up the benefits of something "nifty" versus something efficient and easier to read. A double-select might just be the better approach.

One interesting extension is this: what if you need to see the different sub-ratios based on different flags, but all in the same table (and in a single select)? The solution below can achieve that BUT again I stress that this method would be highly inefficient - only showing it for the purpose of being "nifty"

q)T:([] Flag1:`Sunny`Sunny`Sunny`Sunny`Rainy`Rainy`Rainy`Rainy;Flag2:`Hot`Hot`Cold`Cold`Hot`Hot`Cold`Cold;Qty:10 10 20 20 30 30 40 40);
q)
q)T
Flag1 Flag2 Qty
---------------
Sunny Hot   10
Sunny Hot   10
Sunny Cold  20
Sunny Cold  20
Rainy Hot   30
Rainy Hot   30
Rainy Cold  40
Rainy Cold  40

q)myGroup:{@[;raze g] raze s%sum each s:sum each flip each y g:group x}
q)
q)select Flag1,Flag2,sum'[Qty],both:sum'[Qty]%sum raze Qty,f1:myGroup[Flag1;Qty],f2:myGroup[Flag2;Qty] from `Flag1`Flag2 xgroup T
Flag1 Flag2 Qty both f1        f2
----------------------------------------
Sunny Hot   20  0.1  0.3333333 0.25
Sunny Cold  40  0.2  0.6666667 0.3333333
Rainy Hot   60  0.3  0.4285714 0.75
Rainy Cold  80  0.4  0.5714286 0.6666667

Upvotes: 1

Related Questions