bigO6377
bigO6377

Reputation: 1274

KDB:selecting data “around” time of certain events Part2

Follow up to this question... KDB:selecting data "around" time of certain events

Consider a huge table of market data T. I am particularly interested in rows where Status=`SSS.

However, in addition to the rows given by (select from T where Status=`SSS), I also would like to select records that are within a certain short time interval around that event (as opposed to the earlier question, where we selected a fixed number of records surrounding the events). Note that in some cases, these intervals may overlap. What is an efficient way to do this?

Upvotes: 1

Views: 479

Answers (2)

Thomas Smyth
Thomas Smyth

Reputation: 5644

An alternative method to this is to use window join, wj1, which allows you to pass in custom windows for each time and perform an aggregation on the data within that window. A simplified explanation of the syntax is:

wj[window pairs;common columns;table 1;(table 2;(function;column))]

Taking the following tables as an example:

q)trade              / simplified trade table
time  sym
---------
09:00 a
09:30 a
10:00 a
q)quote              / simplified quote table
time  sym px
------------------
09:12 a   9.420396
09:29 a   6.416515
10:07 a   8.53406

To sums all the quote prices within a 20 minute window either side of the trade time for each sym we use the following method. First create pairs of start and end times for the windows:

q)show window:-20 20+\:x`time
08:40 09:10 09:40
09:20 09:50 10:20

Where each list has the same length as the trade table. Then pass this in to wj, alongside the aggregation function sum for the prices px:

q)wj1[window;`sym`time;x;(y;(sum;`px))]
time  sym px
------------------
09:00 a   9.420396
09:30 a   15.83691
10:00 a   8.53406

To investigate the values being aggregated in each window we can use the null function :::

q)wj1[window;`sym`time;x;(y;(::;`px))]
time  sym px
---------------------------
09:00 a   ,9.420396
09:30 a   9.420396 6.416515
10:00 a   ,8.53406

It should be noted that wj1 only considers value inside the window, whereas wj considers prevailing values to be part of the window.

Upvotes: 0

WooiKent Lee
WooiKent Lee

Reputation: 1311

We have an idea here that might help you:

q)n:10000000;
q)T:([]time:asc n?1D0;sym:n?3;price:n?100f;status:@[n?`3;-100000?n;:;`SSS])
q)f:{[t;x;d]t where 0<sums sum @[c#0;;+;]'[(-1+c:count t)&t[`time]binr/:x+/:-1 1*d;1 -1]}
q)f[T;exec time from T where status=`SSS;0D00:00:00.01]
time                 sym price     status
-----------------------------------------
0D00:00:01.169838756 2   77.1118   lbh
0D00:00:01.175813376 2   24.94157  emk
0D00:00:01.176316291 2   68.49994  SSS
0D00:00:01.180037856 1   81.54316  hhi
0D00:00:01.183518022 1   0.6516971 hni
0D00:00:01.291926205 2   51.94651  kjf
0D00:00:01.300173997 0   14.67675  SSS
0D00:00:01.309709250 1   82.77418  oji

The idea here is to extract out the time of event and use binr to find all the time windows that you need:

t1  t2  t3  t4  t5  t6  t7  t8  t9  t10
`a  `b  `c  `S  `d  `e  `S  `f  `g  `h

Say t3, t5 and t6,t8 meets the time window, we put a marker around them

t1  t2  t3  t4  t5  t6  t7  t8  t9  t10
`a  `b  `c  `S  `d  `e  `S  `f  `g  `h
0   0   1   0   0   1   0   0   0   0
0   0   0   0   0   -1  0   0   -1  0

sums sum will highlight all the records you need:

t1  t2  t3  t4  t5  t6  t7  t8  t9  t10
`a  `b  `c  `S  `d  `e  `S  `f  `g  `h
0   0   1   1   1   1   1   1   0   0

then the rest is just straight forward...

Upvotes: 1

Related Questions