rgwozdz
rgwozdz

Reputation: 1143

How to optimize Postgresql max_connections and node-postgres connection pool?

In brief, I am having trouble supporting more than 5000 read requests per minute from a data API leveraging Postgresql, Node.js, and node-postgres. The bottleneck appears to be in between the API and the DB. Here are the implmentation details.

I'm using an AWS Postgresql RDS database instance (m4.4xlarge - 64 GB mem, 16 vCPUs, 350 GB SSD, no provisioned IOPS) for a Node.js powered data API. By default the RDS's max_connections=5000. The node API is load-balanced across two clusters with 4 processes each (2 Ec2s with 4 vCPUs running the API with PM2 in cluster-mode). I use node-postgres to bind the API to the Postgresql RDS, and am attempting to use it's connection pooling feature. Below is a sample of my connection pool code:

var pool = new Pool({
    user: settings.database.username,
    password: settings.database.password,
    host: settings.database.readServer,
    database: settings.database.database,
    max: 25, 
    idleTimeoutMillis: 1000
});

/* Example of pool usage */
pool.query('SELECT my_column FROM my_table', function(err, result){
    
    /* Callback code here */
});

Using this implementation and testing with a load tester, I can support about 5000 requests over the course of one minute, with an average response time of about 190ms (which is what I expect). As soon as I fire off more than 5000 requests per minute, my response time increases to over 1200ms in the best of cases and in the worst of cases the API begins to frequently timeout. Monitoring indicates that for the EC2s running the Node.js API, CPU utilization remains below 10%. Thus my focus is on the DB and the API's binding to the DB.

I have attempted to increase (and decrease for that matter) the node-postgres "max" connections setting, but there was no change in the API response/timeout behavior. I've also tried provisioned IOPS on the RDS, but no improvement. Also, interestingly, I scaled the RDS up to m4.10xlarge (160 GB mem, 40 vCPUs), and while the RDS CPU utilization dropped greatly, the overall performance of the API worsed considerably (couldn't even support the 5000 requests per minute that I was able to with the smaller RDS).

I'm in unfamilar territory in many respects and am unsure of how to best determine which of these moving parts is bottlenecking API performance when over 5000 requests per minute. As noted I have attempted a variety of adjustments based on the review of Postgresql configuration documentation and node-postgres documentation, but to no avail.

If anyone has advice on how to diagnose or optimize I would greatly appreciate it.

UPDATE

After scaling up to m4.10xlarge, i performed a series of load-tests, varying the number of request/min and the max number of connections in each pool. Here are some screen captures of monitoring metrics:

monitoring metrics

db connections

Upvotes: 10

Views: 14010

Answers (3)

vitaly-t
vitaly-t

Reputation: 25820

The best way is to make use of a separate Pool for each API call, based on the call's priority:

const highPriority = new Pool({max: 20}); // for high-priority API calls
const lowPriority = new Pool({max: 5}); // for low-priority API calls

Then you just use the right pool for each of the API calls, for optimum service/connection availability.

Upvotes: 4

rotten
rotten

Reputation: 1630

Since you are interested in read performance can set up replication between two (or more) PostgreSQL instances, and then use pgpool II to load balance between the instances.

Scaling horizontally means you won't start hitting the max instance sizes at AWS if you decide next week you need to go to 10,000 concurrent reads.

You also start to get some HA in your architecture.

--

Many times people will use pgbouncer as a connection pooler even if they have one built into their application code already. pgbouncer works really well and is typically easier to configure and manage that pgpool, but it doesn't do load balancing. I'm not sure if it would help you very much in this scenario though.

Upvotes: 0

cohenjo
cohenjo

Reputation: 586

In order to support more then 5k requests, while maintaining the same response rate, you'll need better hardware...

The simple math states that: 5000 requests*190ms avg = 950k ms divided into 16 cores ~ 60k ms per core which basically means your system was highly loaded.
(I'm guessing you had some spare CPU as some time was lost on networking)

Now, the really interesting part in your question comes from the scale up attempt: m4.10xlarge (160 GB mem, 40 vCPUs).
The drop in CPU utilization indicates that the scale up freed DB time resources - So you need to push more requests!
2 suggestions:

  • Try increasing the connection pool to max: 70 and look at the network traffic (depending on the amount of data you might be hogging the network)
  • also, are your requests to the DB a-sync from the application side? make sure your app can actually push more requests.

Upvotes: 7

Related Questions