Reputation: 1
I have a PostgreSQL cluster composed of four machines, configured as follows:
Three PostgreSQL nodes managed by Patroni and ETCD, using PgBouncer. One of these nodes is the primary, and the other two are replicas. One machine using HAProxy to accept connections and perform load balancing. We have a database with a table containing millions of rows. This table is an hypertable created with TimescaleDB. In our organization, we collect data from meters, which we store in this hypertable, resulting in thousands of rows being inserted daily.
However, the database cannot handle this load. The number of locks keeps increasing, and the RAM usage also continuously grows, eventually causing the primary node to crash.
For context, each node has 12GB of RAM and 16 CPU cores.
What could be causing the continuous increase in locks and RAM usage? What are the best practices to handle high insert rates in a TimescaleDB hypertable? How can we optimize our configuration to prevent the primary node from crashing?
I appreciate any insights or recommendations on how to address these performance issues. Thank you!
I am a young developer with little experience, and I'm doing my best to explain the situation.
PostgreSQL configuration with Patroni :
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
max_parallel_workers: 16
max_parallel_workers_per_gather: 4
parameters:
archive_mode: false
autovacuum: true
autovacuum_max_workers: 5
autovacuum_naptime: 1min
autovacuum_vacuum_cost_delay: 20ms
effective_cache_size: 8GB
effective_io_concurrency: 4
hot_standby: true
log_directory: /var/log/postgresql/pg_log
log_filename: postgresql.log
log_hostname: true
log_statement: all
logging_collector: true
maintenance_work_mem: 512MB
max_connections: 3000
max_locks_per_transaction: 1024
max_prepared_transactions: 0
max_wal_senders: 10
max_worker_processes: 16
shared_buffers: 4096MB
shared_preload_libraries: timescaledb
use_pg_rewind: true
use_slots: true
wal_keep_segments: 128
wal_level: hot_standby
wal_sender_timeout: 60s
work_mem: 64MB
retry_timeout: 10
ttl: 30
PgBouncer configuration :
logfile = /var/log/postgresql/pgbouncer.log
pidfile = /var/run/postgresql/pgbouncer.pid
;; IP address or * which means all IPs
listen_addr = 0.0.0.0
listen_port = 6432
unix_socket_dir = /var/run/postgresql
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 20
reserve_pool_size = 20
reserve_pool_timeout = 2
server_idle_timeout = 5
Here is the pg_locks view for PgAdmin :
Upvotes: 0
Views: 142
Reputation: 311
First, I'd suggest to check if you're having blocking. Having some locking could be normal, depending on your database activity, but the important thing is not to have blocking, where a process is waiting for another one.
I see in your case some exclusive locking which can cause blocking.
You can check it by running, for example, the following query:
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
Depending on the version, this query could be different but you can get the correct query for your version and check the output.
Also, I agree on the number of max connections look pretty big if you're using PgBouncer. What is the actual number of used connections? Is the Pgbouncer running on the same sever that the database?
Upvotes: 0
Reputation: 5161
PostGreSQL has been designed as a multiprocessing by process (equivalent to executables) and not designed like other RDBMS like SQL Server with a thread model. The result is an excessive consumption of resources of all kinds, unsuitable for absorbing a heavy competitive load.
https://www.postgresql.org/message-id/31cc6df9-53fe-3cd9-af5b-ac0d801163f4%40iki.fi
having more than 300 users with a high concurrency in a single PG cluster is out of PG scope... even with some external pooling...
For another reason, the model of MVCC that is internal of the table (other RDBMS use external copies like SQL Server or Oracle) need a cleaner called VACUUM, that brings many locks and causes contention on resources.
https://www.cs.cmu.edu/~pavlo/blog/2023/04/the-part-of-postgresql-we-hate-the-most.html
One way you can try is to add very much more hardware resources to bypass the poor PG executable... That is :
Also PostGreSQL does not have in "memory table" (that are 10 to 30 times faster rather than persistent tables) and is very poor in terms of parallelism of query (PG is not able to parallelize INSERTs, UPDATEs, nor DELETEs queries...) and does not use columnar table nor columnar indexes... that are essentials when dealing whith thousand of millions or rows...
Upvotes: -1