Reputation: 499
We receive quotes from exchange and store them in KDB Ticker Plant. We want to analyze volume in RDB and HDB with minimum impact on performance of these database since they are also used by other teams.
Upvotes: 1
Views: 7818
Reputation: 4491
I'll make some assumptions about table and column names, which I'm sure you can extrapolate
We receive quotes from exchange and store them in KDB Ticker Plant
As a matter of definition, tickerplant
only stores data for a very small amount of time and then logs it to file and fires the data off to RDB (and other listeners).
with minimum impact on performance of these databases
It all depends on (a) your data volume (b) a most optimal where clause. It also depends on whether you have enough RAM on your machine to cope with the queries. The closer you get to the critical, the harder it is for the OS to allocate memory, and therefore the longer it takes to make the query (although memory allocation time pales in comparison to getting data off a disk - so disk speed is also a factor).
Firstly, how we may create a function which splits a day in 10 minutes intervals and for each interval create a stat with volume?
Your friend here is xbar: http://code.kx.com/q/ref/arith-integer/#xbar
getBy10MinsRDB:{[instrument;mkt]
select max volume, min volume, sum volume, avg volume by 10 xbar `minute$time from table where sym=instrument, market=mkt
};
For an HDB the most optimal query (for a date-parted database) is date
then sym
then time
. In your case you haven't asked for time, so I omit.
getBy10MinsHDB:{[dt;instrument;mkt]
select max volume, min volume, sum volume, avg volume by 10 xbar `minute$time from table where date=dt,sym=instrument, market=mkt
};
Should we extract records in a loop portion by portion or in one go with one query?
No, that's the absolute worst way of doing things in KDB :-) there's almost always a nice vector-ised solution.
We have around 150 million records for each day in our database.
Since KDB is a columnar database, the types of the columns you have are as important as the number of records; as that impacts memory.
because they are also used by other teams
If simple queries like above are causing issues, you need to consider splitting the table up by market, perhaps, to reduce query clash and load. If memory isn't an issue, consider -s
for HDB's for multithreaded queries (over multiple days). Consider negative port number on HDB for multithreaded input queue to minimise query clash (although it doesn't necessarily make things faster).
Upvotes: 4