Reputation: 1260

KDB: How to serialize a table for a union join within kdb-tick architecture?

Im trying to modify the kdb-tick architecture to support a union join on incoming data and the local rdb table. I have modified the upd function in the tick.q file to the following:

ups:{[t;x]ts"d"$a:.z.P;
    if[not -16=type first first x;a:"n"$a;x:$[0>type first x;a,x;(enlist(count first x)#a),x]];
    f:key flip value t;pub[t;$[0>type first x;enlist f!x;flip f!x]];if[l;l enlist (`ups;t;x);i+:1];};

With ups:uj subsequently set in the subscriber files. My question relates to how one might serialize a table row before publishing it within the .u.ups[] function. I.e. given a table:

second     |  amount price 
-----------|----------------
02:46:01   |  54     9953.5
02:46:02   |  54     9953.5
02:46:03   |  54     9953.5
02:46:04   |  150    9953.5    
02:46:05   |  150    9954.5

How should one serialize the first row 02:46:01 | 54 9953.5 such that it can be sent via the .u.ups function to subscribers whereby uj will be run between the row and the local table on the subscribers. Thanks in advance for your advice.

Upvotes: 0

Answers (2)

Callum Biggs

Reputation: 1540

I'm going to assume this is related to your previous few questions about disparate schemas. I'd like to suggest an alternative solution, which is only truly viable if you are using kdb version 3.6, which uses anymap. If you can narrow your schemas down to a minimal list of common columns, all other columns can be placed as dictionaries into a general column.

q)tab:([]sym:`$();col1:`float$();colGeneral:(::))
q)`tab upsert (`AAPL;3.454;(`colX`colY`colZ!(1;2.3;"abc")))
`tab
q)`tab upsert (`MSFT;3.0;(`colX`colY!(2;100.0)))
`tab
q)`tab upsert (`AMZN;100.0;((enlist `colX)!(enlist 10)))
`tab
q)tab
sym  col1  colGeneral
----------------------------------------
AAPL 3.454 `colX`colY`colZ!(1;2.3;"abc")
MSFT 3     `colX`colY!(2;100f)
AMZN 100   (,`colX)!,10
q)select colGeneral from tab
colGeneral
-----------------------------
`colX`colY`colZ!(1;2.3;"abc")
`colX`colY!(2;100f)
(,`colX)!,10
q)select sym, colGeneral @\: `colX from tab
sym  x
-------
AAPL 1
MSFT 2
AMZN 10
q)select sym, colGeneral @\: `colY from tab
sym  x
---------
AAPL 2.3
MSFT 100f
AMZN 0N

With 3.6 you can be saving this to disk in any splayed format (splayed, partitioned, segmented) and still easily query the data. The storage of such a table will likely be sub-optimal due to poor compression characteristics of the general column (assuming you wish to compress data), but it will be perfectly functional.

Integrating uj into standard ingestion procedure with each update will be computationally expensive. Using a general column and dictionary method will massively improve your ingestion speed. Below I've given a demonstration using the example given a previous answer to a related question of yours

q)table:()
q)row1:enlist `x`y`colX!(`AMZN;100.0;10)
q)table:table uj row
q)\ts:100000 table:table uj row1
13828 6292352
q)\ts:100000 `tab upsert (`AMZN;100.0;((enlist `colX)!(enlist 10)))
117   12746880

Upvotes: 1

terrylynch

Reputation: 13572

Some of this might help:

You can't set ups:uj in the subscribers because the table name is being passed as a symbol so the subscriber will effectively try to do

uj[`tab1;tab2]

which won't work because uj doesn't accept table names (symbols) as input. You would have to instead set ups to

ups:{x set value[x] uj y}

A standard tickerplant is not designed to handle variable/changing schema - for good reason, it's generally not a good idea to have a schema that changes intraday. However your situation might warrant it so in that case you'd need to modify your .u.ups function to something like

\d .u
ups:{[t;x]ts"d"$a:.z.P;
    x:`time xcols update time:"n"$a from x;
    pub[t;$[98h=type x;x;1=count last x;enlist x;flip x]];if[l;l enlist (`ups;t;x);i+:1];};
\d .

and your feeder process would have to send kdb tables or kdb dictionaries to the .u.ups function. Since a feedhandler process is usually not a kdb process, it may or may not be possible to send tables/dictionaries to the tickerplant as normally the feedhandler would send lists (without column metadata). In your case you need to somehow supply the column metadata to the tickerplant on each update (or maybe you're doing that already?), as otherwise it won't know which columns are which.

In other words your feeder process could send either of the following:

(`.u.upd;`tab;([]col1:`a`b`c;col2:1 2 3))
(`.u.upd;`tab;`col1`col2!(`a;1))
(`.u.upd;`tab;`col1`col2!(`a`b;1 2))

Upvotes: 1

KDB: How to serialize a table for a union join within kdb-tick architecture?

Answers (2)

Related Questions