Chris Morgan
Chris Morgan

Reputation: 1339

Is there a technique with timescaledb to delete rows to reduce the frequency of older timescale data?

I'm storing a number of rows in a hypertable. The table size is growing quite large now even in its current test configuration.

I'd like to reduce the frequency of data from say once every 5 seconds to say once every 60 seconds for data older than a week by deleting a number of these older records.

Can anyone recommend an approach for doing so, or perhaps a better approach that better fits with timescaledb design?

Upvotes: 0

Views: 4646

Answers (3)

Thiago Falcao
Thiago Falcao

Reputation: 5023

Take a look on Data Retention

For example:

SELECT drop_chunks(interval '24 hours', 'conditions');

This will drop all chunks from the hypertable 'conditions' that only include data older than this duration, and will not delete any individual rows of data in chunks.

Upvotes: 1

JasonInVegas
JasonInVegas

Reputation: 391

Ok, I've read 2 minutes of timescaledb documentation, so I'm an expert, right. Here's what I propose:

  1. You already have a table (I'll call it the business table) and a hypertable with raw 5-second data in it
  2. Create a second hypertable with the same columns as the first hypertable
  3. Insert into the 2nd hypertable using a 60-second windowing function and average, minimum, or maximum values for your readings data (you have to decide on which aggregation function is meaningful for your case.) This insert SQL looks something like:
INSERT into minute_table (timestamp, my_reading) 
(SELECT time_bucket('60 seconds', time) as the_minute, avg(my_raw_reading)
  FROM five_second_table
  WHERE time < (now() - interval '1week')
  GROUP BY the_minute
);
  1. Next, delete from the 5-second hypertable where the timestamp in there is within any range of times in the 60-second hypertable.

  2. Finally, schedule something like this to run every week.

Sorry I'm not fluent in all the timescaledb functions but this should get you started on the 'heavy lift' of manually aggregating up from 5-second to 60-second samples.

Upvotes: 1

Mike Freedman
Mike Freedman

Reputation: 1902

So one of the next releases will have a bit in feature around data retention policies around continuous aggregations, so that you can define a continuous aggregation policy that rolls up secondly data into minutely data, then drop the secondly data that's older than some time period.

(That capability doesn't exist today with continuous aggs, but will very shortly. Right now the best approach is either to have some cron job that deletes old data, or one that copies from one table to a second while aggregating, then calling drop_chunks on the first table.)

Upvotes: 3

Related Questions