SCVSH
SCVSH

Reputation: 23

How to drop the oldest entries from a remote TimescaleDb, maintaining the full local backup of the database?

I've got a remote ever growing TimescaleDb database. I would like to keep only the most recent entries in the that Db, backing up the rest of the data to local drive, to achieve constant Db size on the server.

I thought of making full pg_dump backups before retaining and rebuilding the base locally from these backups.

Also, I could use WAL-E to create a continuous copy, somehow ignoring the deletions on the remote database.

What would be the most efficient way to achieve that?

Upvotes: 1

Views: 476

Answers (2)

SCVSH
SCVSH

Reputation: 23

So I've decided to transfer old data chunk by chunk.

First, I SELECT show_chunks(older_than => interval '1 day'); to determine chunks eligible for retain.

Next, I iterate with \copy _timescaledb_internal.name_of_the_chunk_n to 'chunk_n.csv' csv over the selected chunks. After that I use rsync to move csv backups to the local drive.

Finally, I've set up a shallow copy of the remote database (by hand, aware of the bug), and use timescaledb-parallel-copy -skip-header --db-name db_name --table table_name --file /path/to/chunk_n --workers 2 --reporting-period 10s to insert data into the local Db.

Upvotes: 1

Mike Freedman
Mike Freedman

Reputation: 1902

(TimescaleDB person here)

There are two main approaches here:

  • Use a backup system like WAL-E or pgBackRest to continuously replicate data to some other source (like S3).
  • Integrate your use of TimescaleDB's drop_chunks with your data extraction process.

The answer somewhat depends on how complex your data/database is.

If you are looking to primary archive your data in a single hypertable, I would recommend the latter: Use show_chunks to determine which chunks are over a certain range, compute a select over their range and write the data wherever, and then execute drop_chunks over the same range.

Upvotes: 0

Related Questions