Reputation: 499

How to create KDB query which groups by time interval and do not bring RDB down?

We receive quotes from exchange and store them in KDB Ticker Plant. We want to analyze volume in RDB and HDB with minimum impact on performance of these database since they are also used by other teams.

Firstly, how we may create a function which splits a day in 10 minutes intervals and for each interval create a stat with volume? Which KDB functions do we need to use?
Secondly, how to do it safely? Should we extract records in a loop portion by portion or in one go with one query? We have around 150 million records for each day in our database.

Upvotes: 1

Answers (1)

Manish Patel

Reputation: 4491

I'll make some assumptions about table and column names, which I'm sure you can extrapolate

We receive quotes from exchange and store them in KDB Ticker Plant

As a matter of definition, tickerplant only stores data for a very small amount of time and then logs it to file and fires the data off to RDB (and other listeners).

with minimum impact on performance of these databases

It all depends on (a) your data volume (b) a most optimal where clause. It also depends on whether you have enough RAM on your machine to cope with the queries. The closer you get to the critical, the harder it is for the OS to allocate memory, and therefore the longer it takes to make the query (although memory allocation time pales in comparison to getting data off a disk - so disk speed is also a factor).

Firstly, how we may create a function which splits a day in 10 minutes intervals and for each interval create a stat with volume?

Your friend here is xbar: http://code.kx.com/q/ref/arith-integer/#xbar

getBy10MinsRDB:{[instrument;mkt]
    select max volume, min volume, sum volume, avg volume by 10 xbar `minute$time from table where sym=instrument, market=mkt
};

For an HDB the most optimal query (for a date-parted database) is date then sym then time. In your case you haven't asked for time, so I omit.

getBy10MinsHDB:{[dt;instrument;mkt]
    select max volume, min volume, sum volume, avg volume by 10 xbar `minute$time from table where date=dt,sym=instrument, market=mkt
};

Should we extract records in a loop portion by portion or in one go with one query?

No, that's the absolute worst way of doing things in KDB :-) there's almost always a nice vector-ised solution.

We have around 150 million records for each day in our database.

Since KDB is a columnar database, the types of the columns you have are as important as the number of records; as that impacts memory.

because they are also used by other teams

If simple queries like above are causing issues, you need to consider splitting the table up by market, perhaps, to reduce query clash and load. If memory isn't an issue, consider -s for HDB's for multithreaded queries (over multiple days). Consider negative port number on HDB for multithreaded input queue to minimise query clash (although it doesn't necessarily make things faster).

Upvotes: 4

How to create KDB query which groups by time interval and do not bring RDB down?

Answers (1)

Related Questions