Reputation: 1274
Follow up to this question... KDB:selecting data "around" time of certain events
Consider a huge table of market data T. I am particularly interested in rows where Status=`SSS.
However, in addition to the rows given by (select from T where Status=`SSS), I also would like to select records that are within a certain short time interval around that event (as opposed to the earlier question, where we selected a fixed number of records surrounding the events). Note that in some cases, these intervals may overlap. What is an efficient way to do this?
Upvotes: 1
Views: 479
Reputation: 5644
An alternative method to this is to use window join, wj1
, which allows you to pass in custom windows for each time and perform an aggregation on the data within that window. A simplified explanation of the syntax is:
wj[window pairs;common columns;table 1;(table 2;(function;column))]
Taking the following tables as an example:
q)trade / simplified trade table
time sym
---------
09:00 a
09:30 a
10:00 a
q)quote / simplified quote table
time sym px
------------------
09:12 a 9.420396
09:29 a 6.416515
10:07 a 8.53406
To sums all the quote
prices within a 20 minute window either side of the trade
time for each sym
we use the following method. First create pairs of start and end times for the windows:
q)show window:-20 20+\:x`time
08:40 09:10 09:40
09:20 09:50 10:20
Where each list has the same length as the trade
table. Then pass this in to wj
, alongside the aggregation function sum
for the prices px
:
q)wj1[window;`sym`time;x;(y;(sum;`px))]
time sym px
------------------
09:00 a 9.420396
09:30 a 15.83691
10:00 a 8.53406
To investigate the values being aggregated in each window we can use the null function ::
:
q)wj1[window;`sym`time;x;(y;(::;`px))]
time sym px
---------------------------
09:00 a ,9.420396
09:30 a 9.420396 6.416515
10:00 a ,8.53406
It should be noted that wj1
only considers value inside the window, whereas wj
considers prevailing values to be part of the window.
Upvotes: 0
Reputation: 1311
We have an idea here that might help you:
q)n:10000000;
q)T:([]time:asc n?1D0;sym:n?3;price:n?100f;status:@[n?`3;-100000?n;:;`SSS])
q)f:{[t;x;d]t where 0<sums sum @[c#0;;+;]'[(-1+c:count t)&t[`time]binr/:x+/:-1 1*d;1 -1]}
q)f[T;exec time from T where status=`SSS;0D00:00:00.01]
time sym price status
-----------------------------------------
0D00:00:01.169838756 2 77.1118 lbh
0D00:00:01.175813376 2 24.94157 emk
0D00:00:01.176316291 2 68.49994 SSS
0D00:00:01.180037856 1 81.54316 hhi
0D00:00:01.183518022 1 0.6516971 hni
0D00:00:01.291926205 2 51.94651 kjf
0D00:00:01.300173997 0 14.67675 SSS
0D00:00:01.309709250 1 82.77418 oji
The idea here is to extract out the time of event and use binr
to find all the time windows that you need:
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
`a `b `c `S `d `e `S `f `g `h
Say t3, t5 and t6,t8 meets the time window, we put a marker around them
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
`a `b `c `S `d `e `S `f `g `h
0 0 1 0 0 1 0 0 0 0
0 0 0 0 0 -1 0 0 -1 0
sums sum
will highlight all the records you need:
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
`a `b `c `S `d `e `S `f `g `h
0 0 1 1 1 1 1 1 0 0
then the rest is just straight forward...
Upvotes: 1