Reputation: 1917
(I was going to post this on the DBA SE, but there's no timescale tag.)
I'm using TimescaleDB to store a real-time stream of about 500-1000 inserts per second, and need to pass this data to a client application as soon as it comes in.
I don't necessarily want to process the raw millisecond data (I may), but I definitely need the client to receive 1, 5, 10, 30 second (etc.) aggregations of this live data, as each time bucket/slot is ready. So, every second, every 5 seconds, every 10 seconds, etc. (the values would be things like max, min, etc. for each time slot).
I was going to use Postgres' triggers to aggregate the data and put it in a new table in a waterfall type manner, and use listen/notify to tell the client when each time chunk is "ready", but then I came across PipelineDB.
However, it's not clear to me exactly how to set that architecture up. Do I want PipelineDB to get the initial data stream, and then pass it to the Timescale hyper table? Do I want Timescale to pass the data to Pipeline? I assume Pipeline would be the point of contact for the client.
What would the most performant general architecture be; and how would I pass data between Timescale and Pipeline?
Upvotes: 3
Views: 1419
Reputation: 2733
It certainly is possible (using actual structures from a test setup):
CREATE FOREIGN TABLE s_tracking (
c_timestamp TIMESTAMPTZ,
c_session BIGINT,
c_request BIGINT,
c_client VARCHAR,
c_gender VARCHAR,
c_path VARCHAR
) SERVER pipelinedb;
CREATE FUNCTION func_tracking_insert_pipe() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
INSERT INTO
t_tracking (c_timestamp, c_session, c_request, c_client, c_gender, c_path)
VALUES
(NEW.c_timestamp, NEW.c_session, NEW.c_request, NEW.c_client, NEW.c_gender, NEW.c_path)
;
RETURN NEW;
END;$$
;
CREATE VIEW v_t_forward_raw_data
WITH (action=transform, outputfunc=func_tracking_insert_pipe)
AS
SELECT c_timestamp, c_session, c_request, c_client, c_gender, c_path FROM s_tracking
;
INSERT INTO s_tracking
(c_timestamp, c_session, c_request, c_client, c_gender, c_path)
...
...which will result into producing all metrics as defined for stream s_tracking
as well as full raw data into the (hyper-)table t_tracking
.
However, due to the workaround using a row-based trigger function, there will be a penalty for every INSERT
involved.
If you can live with that, the above is probably your best chance atm.
Upvotes: 2
Reputation: 178
Currently there is no native integration between PipelineDB and TimescaleDB but there likely will be in the not-so-distant future. Performance wise, the best option is currently to simply bifurcate writes to both TimescaleDB and PipelineDB.
Upvotes: 2