mchen
mchen

Reputation: 10136

How to join splayed table in KDB?

I have 2 very large (billions of rows) splayed tables, Trades and StockPrices, on a remote server. I want to do an asof join

h:hopen `:RemoteServer:Port
h"aj[`Stock`Date`Time,
        select from Trades where Date within 2014.04.01 2014.04.13,
        StockPrices
    ]"

But I just get the error (I'm Studio for KDB+)

An error occurred during execution of the query.
The server sent the response:
splay
Studio Hint: Possibly this error refers to nyi op on splayed table

So what would be the correct way to do such a join?

Also, performance and efficiency is an issue with such a big table -- what should I be doing to ensure the query doesn't take hours and doesn't consume to much of the server's system resources?

Upvotes: 1

Views: 1544

Answers (1)

MdSalih
MdSalih

Reputation: 1996

You need to map the splayed StockPrices table into memory. This can be done by using a select query:

q)(`::6060)"aj[`sym`time;select from trade;quote]"                      / bad
   'splay
q)(`::6060)"aj[`sym`time;select from trade;select from quote]"          / good
   sym time         prx      bid      ask
   -------------------------------------------
   aea 01:01:16.347 637.7554 866.0131 328.1476
   aea 01:59:14.108 819.5301 115.053  208.1114
   aea 02:42:44.724 69.38325 641.8554 333.3092

This page may be useful for looking up errors from Kdb+: http://code.kx.com/q/ref/error-list/

Regarding optimising performance of aj see http://code.kx.com/q/ref/joins/#aj-aj0-asof-join

Also, if there isn't an overlap of data between days, it may be faster to run the query on a day by day basis, possibly in parallel.

If there is an overlap of data across days, combining the date & time columns into a single timestamp column would speed up the lookup.

Upvotes: 4

Related Questions