Reputation: 962
I have been practising bulk load using Laurenz Albe' s blog https://www.cybertec-postgresql.com/en/postgresql-bulk-loading-huge-amounts-of-data/. The problem start when I changed it a little bit. The table is the same:
CREATE TABLE t_sample
(
a varchar(50),
b int,
c varchar(50),
d int
);
Instead of
BEGIN;
INSERT INTO t_sample VALUES ('abcd', 1, 'abcd', 1);
INSERT INTO t_sample VALUES ('abcd', 1, 'abcd', 1);
INSERT INTO t_sample VALUES ('abcd', 1, 'abcd', 1);
…
COMMIT;`
I use
BEGIN;
INSERT INTO t_sample VALUES
('abcd', 1, 'abcd', 1),
('abcd', 1, 'abcd', 1),
('abcd', 1, 'abcd', 1),
…
COMMIT;
For those who do not want to read the blog: I try to compare memory consumption between COPY
and insert method.
Also, instead of 1 m records I used 2097152 records. I keep this commands as a file, multipleinsert.sql and run it like psql -d load_test -f multipleinsert.sql
.
Also, as stated in the blog I used COPY
method as following (of course with the same number of records, 2097152)
COPY t_sample FROM stdin;
abcd 1 abcd 1
abcd 1 abcd 1
abcd 1 abcd 1
...
The execution time for insert 14.543s and for COPY
is 1.237s. Yet, this is not the point I want to compare. When I use COPY
the available memory on the server nearly does not decrease, but during execution of the insert statement the available memory decreases nearly 6,5 GB. The file that contains insert statement itself is nearly 49 MB, so PostgreSQL tries to cache it, but why does it scale to ~6,5GB? Is there a calculation method for (roughly) calculating memory that PostgreSQL will consume for such a loading?
Note: How does COPY work and why is it so much faster than INSERT?, stated well why COPY
is faster than insert but not this one nor any other post over the internet did not shed a light over memory consumption.
Upvotes: 3
Views: 2192
Reputation: 246023
The multi-line INSERT
is processed differently from COPY
.
With the INSERT
, a large statement is sent from the client, parsed and planned on the server and then executed. The whole statement is kept in memory on the client and on the server, and the whole big thing has to be parsed and planned, so I am not surprised if that uses a lot of memory.
However, I cannot explain why a 50MB statement would temporarily need 6.5 GB RAM. That seems excessive.
The COPY
statement itself is small, and the data are streamed from client to server and inserted into the table as they arrive. COPY
can batch several rows for a bulk insert for performance, but the batch will be smaller than your two million rows, so it will use way less memory.
To debug why so much memory is used, disable memory overcommit on the machine, then let the INSERT
run until it is out of memory. PostgreSQL will write a memory context dump to the log file, which will tell you where the memory is allocated.
Upvotes: 0