Optimising Postgresql Queries

Question

I have two tables and i have to query my postgresql database. The table 1 has about 140 million records and table 2 has around 50 million records of the following.

the table 1 has the following structure:

tr_id bigint NOT NULL, # this is the primary key
query_id numeric(20,0),       # indexed column
descrip_id numeric(20,0)      # indexed column

and table 2 has the following structure

query_pk  bigint # this is the primary key
query_id  numeric(20,0)    # indexed column
query_token numeric(20,0)

The sample db of table1 would be

The sample db of table2 would be

I am preferring queries in which i would be able to query in blocks of tr_id. In range of 10,000 as this is my requirement.

I would like to get output in the following manner

25  {9554,9456,9785,9514}
26  {7412,7433}
27  {545,5789,1566}
28  {122,1456}

I tried in the following manner

select query_id, 
       array_agg(query_token) 
from sch.table2 
where query_id in (select query_id 
                   from sch.table1 
                   where tr_id between 90001 and 100000) 
group by query_id

I am performing the following query which takes about 121346 ms and when some 4 such queries are fired it still takes longer time. Can you please help me to optimise the same.

I have a machine which runs on windows 7 with i7 2nd gen proc with 8GB of RAM.

The following is my postgresql configuration

shared_buffers = 1GB    
effective_cache_size = 5000MB
work_mem = 2000MB

What should I do to optimise it.

Thanks

EDIT : it would be great if the results ordered according to the following format

25  {9554,9456,9785,9514}
28  {122,1456}
27  {545,5789,1566}
26  {7412,7433}

ie according to the order of the queryid present in table1 ordered by tr_id. If this is computationally expensive may be in the client code i would try to optimise it. But I am not sure how efficient it would be.

Thanks

Erwin Brandstetter · Accepted Answer

Query

I expect a JOIN to be much faster that the IN condition you have presently:

SELECT t2.query_id
      ,array_agg(t2.query_token) AS tokens
FROM   t1
JOIN   t2 USING (query_id)
WHERE  t1.tr_id BETWEEN 1 AND 10000
GROUP  BY t1.tr_id, t2.query_id
ORDER  BY t1.tr_id;

This also sorts the results as requested. query_token remains unsorted per query_id.

Indexes

Obviously you need indexes on t1.tr_id and t2.query_id.
You obviously have that one already:

CREATE INDEX t2_query_id_idx ON t2 (query_id);

A multicolumn index on t1 may improve performance (you'll have to test):

CREATE INDEX t1_tr_id_query_id_idx ON t1 (tr_id, query_id);

Server configuration

If this is a dedicated database server, you can raise the setting for effective_cache_size some more.

@Frank already gave advise on work_mem. I quote the manual:

Note that for a complex query, several sort or hash operations might be running in parallel; each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem;

It should be just big enough to be able to sort your queries in RAM. 10 MB is more than plenty to hold 10000 of your rows at a time. Set it higher, if you have queries that need more at a time.

With 8 GB on a dedicated database server, I would be tempted to set shared_buffers to at least 2 GB.

shared_buffers = 2GB    
effective_cache_size = 7000MB
work_mem = 10MB

More advice on performance tuning in the Postgres Wiki.

Optimising Postgresql Queries

Answers (1)

Query

Indexes

Server configuration

Related Questions