PostgreSQL Crashes When Doing Bulk Inserts with Node.js/Sequelize

Question

A Node.js app using Sequelize.js ORM is performing bulk inserts to a PostgreSQL 11.2 server running inside a Docker container on a Mac OSX host system. Each bulk insert typically consists of about 1000-4000 rows, with a bulk insert concurrency of 30, so there is a max of 30 active insert operations at any time.

const bulkInsert = async (payload) => {
    try {
        await sequelizeModelInstance.bulkCreate(payload);
    } catch (e) {
        console.log(e);
    }
}

pLimit = require('p-limit')(30);

(function() => {
    const promises = data.map(d => pLimit(() => bulkInsert(d))) // pLimit() controls Promise concurrency
    const result = await Promise.all(promises)
})();

After some time, the PostgreSQL server will start giving errors Connection terminated unexpectedly, followed by the database system is in recovery mode.

After repeating this several times and checking my logs, it appears that this error usually occurs when performing a batch of 30 bulk inserts where several bulk inserts contain over 100,000 row each. For example, one particular crash occurs when attempting to make 3 bulk inserts of 190000, 650000 and 150000 rows together with 27 inserts of 1000-4000 rows each.

System memory is not full, CPU load is normal, sufficient disk space is available.

Question: Is it normal to expect PostgreSQL to crash under such circumstances? If so, is there a PostgreSQL setting we can tune to allow larger bulk inserts? If this is because of the large bulk inserts, does Sequelize.js have a function to split up the bulk inserts for us?

Running on PostgreSQL 11.2 in docker container, TimescaleDB 1.5.1, node v12.6.0, sequelize 5.21.3, Mac Catalina 10.15.2

PostgreSQL Logs Right After Problem Occurs

2020-01-18 00:58:26.094 UTC [1] LOG:  server process (PID 199) was terminated by signal 9
2020-01-18 00:58:26.094 UTC [1] DETAIL:  Failed process was running: INSERT INTO "foo" ("id","opId","unix","side","price","amount","b","s","serverTimestamp") VALUES (89880,'5007564','1579219200961','front','0.0000784','35','undefined','undefined','2020-01-17 00:00:01.038 +00:00'),.........
2020-01-18 00:58:26.108 UTC [1] LOG:  terminating any other active server processes
2020-01-18 00:58:26.110 UTC [220] WARNING:  terminating connection because of crash of another server process
2020-01-18 00:58:26.110 UTC [220] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-01-18 00:58:26.110 UTC [220] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-01-18 00:58:26.148 UTC [214] WARNING:  terminating connection because of crash of another server process
2020-01-18 00:58:26.148 UTC [214] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-01-18 00:58:26.148 UTC [214] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-01-18 00:58:26.149 UTC [203] WARNING:  terminating connection because of crash of another server process
2020-01-18 00:58:26.149 UTC [203] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

...

2020-01-18 00:58:30.098 UTC [1] LOG:  all server processes terminated; reinitializing
2020-01-18 00:58:30.240 UTC [223] FATAL:  the database system is in recovery mode
2020-01-18 00:58:30.241 UTC [222] LOG:  database system was interrupted; last known up at 2020-01-18 00:50:13 UTC
2020-01-18 00:58:30.864 UTC [224] FATAL:  the database system is in recovery mode
2020-01-18 00:58:31.604 UTC [225] FATAL:  the database system is in recovery mode
2020-01-18 00:58:32.297 UTC [226] FATAL:  the database system is in recovery mode
2020-01-18 00:58:32.894 UTC [227] FATAL:  the database system is in recovery mode
2020-01-18 00:58:33.394 UTC [228] FATAL:  the database system is in recovery mode
2020-01-18 01:00:55.911 UTC [222] LOG:  database system was not properly shut down; automatic recovery in progress
2020-01-18 01:00:56.856 UTC [222] LOG:  redo starts at 0/197C610
2020-01-18 01:01:55.662 UTC [229] FATAL:  the database system is in recovery mode

Nick · Accepted Answer

Your Postgresql server is probably getting killed by the OOM Killer(Out of Memory Killer) from the Docker OS.

You can:

Increase the memory available to Postgres, 2GB is a low value for the amount of operations you are running.
Reduce your bulk inserts size and limit their concurrency.
Tune your Postgres installation to fit your hardware:
- shared_buffers: This as recommended here should be 25% of the memory in your system, this is a recommendation, you should always benchmark and evaluate your scenario and choose the values that seem fit to your environment.
- work_mem: As explained here:

This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem. You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory.

There are a lot of configurations you can change on your Postgres in order to improve performance and stability. Postgres will run just fine with the default settings but in production environments or heavy-load environments, you will need to tune it inevitably.

PostgreSQL Crashes When Doing Bulk Inserts with Node.js/Sequelize

Answers (2)

Related Questions