Reputation: 165
I have a requirement where I need to store the records at rate of 10,000 records/sec into a database (with indexing on a few fields). Number of columns in one record is 25. I am doing a batch insert of 100,000 records in one transaction block. To improve the insertion rate, I changed the tablespace from disk to RAM.With that I am able to achieve only 5,000 inserts per second.
I have also done the following tuning in the postgres config:
Other information:
I am wondering why a single insert query is taking around 0.2 msec on average when database is not writing anything on disk (as I am using RAM based tablespace). Is there something I am doing wrong?
Help appreciated.
Prashant
Upvotes: 7
Views: 10626
Reputation: 31211
\COPY schema.temp_table FROM /tmp/data.csv WITH CSV
For large volumes of data:
SELECT
statements will use. In other words, try to align the physical model with the logical model.CLUSTER
index (most important column on the left). For example:CREATE UNIQUE INDEX measurement_001_stc_index ON climate.measurement_001 USING btree (station_id, taken, category_id); ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;
On a machine with 4GB of RAM, I did the following...
Tell the Kernel that it's okay for programs to use gobs of shared memory:
sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf
/etc/postgresql/8.4/main/postgresql.conf
and set:shared_buffers = 1GB temp_buffers = 32MB work_mem = 32MB maintenance_work_mem = 64MB seq_page_cost = 1.0 random_page_cost = 2.0 cpu_index_tuple_cost = 0.001 effective_cache_size = 512MB checkpoint_segments = 10
For example, let's say you have data based on weather, divided into different categories. Rather than having a single monstrous table, divide it into several tables (one per category).
CREATE TABLE climate.measurement
(
id bigserial NOT NULL,
taken date NOT NULL,
station_id integer NOT NULL,
amount numeric(8,2) NOT NULL,
flag character varying(1) NOT NULL,
category_id smallint NOT NULL,
CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001: taken date NOT NULL,
-- Inherited from table climate.measurement_001: station_id integer NOT NULL,
-- Inherited from table climate.measurement_001: amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001: flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001: category_id smallint NOT NULL,
CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
OIDS=FALSE
);
Bump up the table stats for the important columns:
ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;
Don't forget to VACUUM
and ANALYSE
afterwards.
Upvotes: 16
Reputation: 2049
I suggest you to use COPY
instead of INSERT
.
You should also fine tune your postgresql.conf file.
Read about on http://wiki.postgresql.org/wiki/Performance_Optimization
Upvotes: 1
Reputation: 127466
Did you place xlog (the WAL segments) also on your RAM-drive? If not, you're still writing to disk. And what about the settings for wal_buffers, checkpoint_segments, etc. ? You have to try to get all your 100,000 records (your single transaction) in your wal_buffers. Increasing this parameter might cause PostgreSQL to request more System V shared memory than your operating system's default configuration allows.
Upvotes: 3
Reputation: 33257
are you doing your insert as a series of
INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...
or as one multiple-row insert:
INSERT INTO tablename (...) VALUES (...),(...),(...);
second one will be faster significantly on 100k rows.
source: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/
Upvotes: 5