The AI Architect
The AI Architect

Reputation: 1917

TimescaleDB and PipelineDB Integration

(I was going to post this on the DBA SE, but there's no timescale tag.)

I'm using TimescaleDB to store a real-time stream of about 500-1000 inserts per second, and need to pass this data to a client application as soon as it comes in.

I don't necessarily want to process the raw millisecond data (I may), but I definitely need the client to receive 1, 5, 10, 30 second (etc.) aggregations of this live data, as each time bucket/slot is ready. So, every second, every 5 seconds, every 10 seconds, etc. (the values would be things like max, min, etc. for each time slot).

I was going to use Postgres' triggers to aggregate the data and put it in a new table in a waterfall type manner, and use listen/notify to tell the client when each time chunk is "ready", but then I came across PipelineDB.

However, it's not clear to me exactly how to set that architecture up. Do I want PipelineDB to get the initial data stream, and then pass it to the Timescale hyper table? Do I want Timescale to pass the data to Pipeline? I assume Pipeline would be the point of contact for the client.

What would the most performant general architecture be; and how would I pass data between Timescale and Pipeline?

Upvotes: 3

Views: 1419

Answers (2)

Ancoron
Ancoron

Reputation: 2733

It certainly is possible (using actual structures from a test setup):

  1. create a stream, e.g.:
CREATE FOREIGN TABLE s_tracking (
    c_timestamp TIMESTAMPTZ,
    c_session BIGINT,
    c_request BIGINT,
    c_client VARCHAR,
    c_gender VARCHAR,
    c_path VARCHAR
) SERVER pipelinedb;
  1. create a trigger function that inserts entries, e.g.:
CREATE FUNCTION func_tracking_insert_pipe() RETURNS trigger
    LANGUAGE plpgsql AS
$$BEGIN
    INSERT INTO
        t_tracking (c_timestamp, c_session, c_request, c_client, c_gender, c_path)
    VALUES
        (NEW.c_timestamp, NEW.c_session, NEW.c_request, NEW.c_client, NEW.c_gender, NEW.c_path)
    ;
    RETURN NEW;
END;$$
;
  1. create a continuous transform, e.g.:
CREATE VIEW v_t_forward_raw_data
    WITH (action=transform, outputfunc=func_tracking_insert_pipe)
    AS
        SELECT c_timestamp, c_session, c_request, c_client, c_gender, c_path FROM s_tracking
;
  1. insert into the stream instead of the table:
INSERT INTO s_tracking
    (c_timestamp, c_session, c_request, c_client, c_gender, c_path)
...

...which will result into producing all metrics as defined for stream s_tracking as well as full raw data into the (hyper-)table t_tracking.

However, due to the workaround using a row-based trigger function, there will be a penalty for every INSERT involved.

If you can live with that, the above is probably your best chance atm.

Upvotes: 2

Derek Nelson
Derek Nelson

Reputation: 178

Currently there is no native integration between PipelineDB and TimescaleDB but there likely will be in the not-so-distant future. Performance wise, the best option is currently to simply bifurcate writes to both TimescaleDB and PipelineDB.

Upvotes: 2

Related Questions