Reputation:
I'm trying to find the average value by item type, and then set a variable y that takes the value:
This is what I've tried, which doesn't produce the desired result below:
tab2:([]items:`a`b`a`a`b; x:-6 8 0 -3 5)
tabsum: select mu:avg x by items from tab2;
tab2: update y:x from tab2
tab2: update y:mu from tab2 where x > 0 and x > mu / get error after running above step
tab2: update y:mu from tab2 where x < 0 and x <= mu
Desired result:
items x mu y
a -6 -3.0 -3.0
b 8 6.5 6.5
a 0 -3.0 NaN
a -3 -3.0 -3.0
b 5 6.5 5
Related to this data:
i) Is NaN the appropriate type for missing values in kdb? (NA
is different to NaN
or NULL
in R for example). I'm guessing so based on what I've read so far.
ii) Is there more efficient code to get the mu column inside tab2
? Making another table and merging I'm guessing isn't efficient (still learning basics of kdb)
iii) If I just run
tab2:([]items:`a`b`a`a`b; x:-6 8 0 -3 5)
tabsum: select mu:avg x by items from tab2;
tab2: update y:mu from tab2 where x > 0 and x > mu
I get:
items x mu y
1 a -6 -3.0 NaN
2 b 8 6.5 6.5
3 a 0 -3.0 NaN
4 a -3 -3.0 NaN
5 b 5 6.5 6.5
Line 5 doesn't make sense to me. Why is y = 6.5 if x < mu? I expect y =5 for that row. Clearly my understanding of what's going on is wrong.
iv) How can I get the desired result (the code I've posted doesn't work properly)
Upvotes: 0
Views: 412
Reputation: 1379
You can do this in one line:
tab3: update y:?[abs[x]>abs[mu];mu;x] from (update mu:avg x by items from tab2) where x<>0
Upvotes: 0
Reputation: 2069
This might not be the prettiest solution, but seems to fit your criteria. I'm using a vector conditional ?
to set the value of y:
q)show tab3: update y:?[((x>0) and x>mu) or ((x<0) and x<=mu);mu;x] from update mu:avg x by items from tab2
items x mu y
----------------
a -6 -3 -3
b 8 6.5 6.5
a 0 -3 0
a -3 -3 -3
b 5 6.5 5
And then replacing any y=0
values with null 0n
(rather than NaN):
q)update y:0n from tab3 where y=0
items x mu y
----------------
a -6 -3 -3
b 8 6.5 6.5
a 0 -3
a -3 -3 -3
b 5 6.5 5
The unexpected result on your last line is down to order of execution; q code is executed right-to-left. So:
5>0 and 5>6.5
actually means:
5>(0 and 5>6.5)
which is:
5>0
which evaluates to true. To make the desired comparison, you need to use brackets:
q)(5>0) and 5>6.5
0b
There's a few reasons why your block of code doesn't work. On line 2, you don't update tab2
with the mu
column; you just create a new table. So consequently when you try to use mu
on line 4, it doesn't exist (which will cause an error). On line 3 you set the values of y
to be the same as x
, but this also means the same type (integer). Then later when you try to reset y
to a float value from mu
you get a 'type
error.
Here's a block of code similar to yours that works:
q)tab2:([]items:`a`b`a`a`b; x:-6 8 0 -3 5)
q)tab2: update mu:avg x by items from tab2
q)tab2: update y:"f"$x from tab2
q)tab2: update y:mu from tab2 where (x>0) and (x> mu)
q)tab2: update y:mu from tab2 where (x<0) and (x<= mu)
q)tab2
items x mu y
----------------
a -6 -3 -3
b 8 6.5 6.5
a 0 -3 0
a -3 -3 -3
b 5 6.5 5
Upvotes: 2