Kyle Durnam
Kyle Durnam

Reputation: 87

How to fire an update per new row in kdb+/q?

I am trying to create a service whereby irregularly spaced events from several sensors are aggregated into 1 second time buckets/ sample rate (seen as though not doing so would be computationally expensive/memory intensive), when a second passes i.e. a new row is formed I would like to fire/trigger an event/ publish the last row that is received by subscribers according to the kdb+ tick architecture. KDB+ tick architecture

From what I understand from stream processing one should buffer the events before running an aggregation, my question is how one should implement this in kdb+/q whereby after the completion of a specific time interval (1 second in this instance) an aggregation can be performed on the latest buffered data, the results of which are appended to a table/sent to subscribers that contain regularly spaced aggregations of the irregular sensor data, clearing the buffer in the process.

So in simple terms:

1) How can one implement buffering functionality that collects irregularly spaced events, whilst maintaining memory constraints?

2) How can one aggregate the irregularly spaced events accurately at an interval forming regularly spaced aggregations of the buffered events and sending that aggregation to subscribers?

(My interpretation of the functionality herein could be completely wrong, If there is a better way to achieve this, that would be great!)

Your advice on this matter would be truly appreciated.

Thanks.

Upvotes: 3

Views: 623

Answers (1)

Callum Biggs
Callum Biggs

Reputation: 1540

Terry is correct in his comments, a TP will batch in periods specified by the timer when launching the process. For example, to run a vanilla TP with 1 second batching would be as follows.

q tick.q sym  .  -p 5010 -t 1000

I would strongly recommend against doing any processing within the TP, it should act as a point of ingress and the creator of logs for recovery. Whether to run the TP in a batch mode or in zero latency (not supplying a -t parameter) is dependent on the nature of the updates you receive, the whitepaper on tickerplant throughput optimization is your best bet for advice here.

You could have chained tickerplants in which you perform further aggregations or computations, but I would personally opt to have a Realtime Engine (RTE) operating as follows

  1. RTE is a subscriber of the TP for the table to be aggregated on (e.g., trades)
  2. Aggregations could be performed with .z.ts or be triggered from the time data of the table, the former being much easier to implement. These aggregations are part of the aggregations table, distinct from the original data table, say tradesAgg
  3. The final stage of the aggregations would involve publishing back to the TP. This ensures your TP log file contains the history of all data that has entered or been produced by the system.
  4. Your RDB would be subscribed to this aggregation table and through the RDB, the data would enter your historical database

Upvotes: 1

Related Questions