waeiski
waeiski

Reputation: 55

Is it possible to dump/copy PostgreSQL table while new data is inserted into it?

I've set up a PostgreSQL database (version 11.2) in which I have a table into which new entries are inserted at semi-regular intervals (15-30 minutes). The insertion is done with a python script with SQLAlchemy and Pandas with df.to_sql() command. The existing database is quite large and copying/dumping it would most likely take longer than 30 minutes.

Would running CREATE TABLE new_table AS TABLE old_table; interrupt the data insertion process? If yes, is there another way to do this without interruptions?

The database is running on a Red Hat Enterprise server version 7.6. I have admin rights to the whole database and can access it with PuTTy + psql -U username -d my_database and also from pgAdmin if that makes a difference. I haven't tried anything yet for fear of interrupting the collection process.

Upvotes: 1

Views: 1034

Answers (1)

Nihal Sangeeth
Nihal Sangeeth

Reputation: 5525

A transaction in Postgresql any single, all-or-nothing operation. A transaction is said to be atomic: from the point of view of other transactions, it either happens completely or not at all.

The updates made so far by an open transaction are invisible to other transactions until the transaction completes, whereupon all the updates become visible simultaneously.The transaction only becomes real(or visible to other transactions) after it has been committed with the COMMIT; command.

Your first insert command will probably only lock those rows that are currently being inserted. These rows will only be available one the insert transaction has been finished and committed.

To answer your question:

1) Would running CREATE TABLE new_table AS TABLE old_table; interrupt the data insertion process?

No. Not at all.

2) If yes, is there another way to do this without interruptions?

Answered by the first question.

The scenario here is create table command will only copy that data which is not currently locked by any transaction. So for the insert command this will (mostly) only be currently inserted rows. The create table will copy all data thats before insert transaction.

You can check the locks by insert command as following:

select * from pg_stat_activity;

This will output something like.

-[ RECORD 2 ]----+--------------------------------
datid            | 73103
datname          | database
procpid          | 28477
sess_id          | 16424
usesysid         | 10
usename          | user
current_query    | insert .....
waiting          | f
query_start      | 2019-05-20 06:10:21.126825+00
backend_start    | 2019-05-20 05:43:51.600017+00
client_addr      | 0.0.0.0
client_port      | 
application_name | 
xact_start       | 

From this we can ascertain the locks created by this process as:

select * from pg_locks where pid = 28477;

You can update, delete, insert with many transactions on the same table without the transactions blocking each other - provided that each transactions deals with different rows. Two inserts will only block each other if the second one is trying to insert the same primary key (or unique key) values as the first one. If you want no locks to affect your query you can use WITH NOLOCK but use it carefully.

More info at :
https://www.postgresql.org/docs/9.1/explicit-locking.html
https://www.postgresql.org/docs/9.1/transaction-iso.html

Upvotes: 1

Related Questions