rshar
rshar

Reputation: 1475

How to make a postgres query run faster?

I have following tables with Table1 with rows: >25million rows

Table1:

chrom strand ref_base alt_base pos      gene_ensembl_identifier seq_window_9mers  mutated_base seq_window_mut_9mers 
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ -------------------- 
3     1      C        T        40457498 ENSG00000168032         ACGCTCTACACACACAG A            ACGCTCTAAACACACAG    

Table2

seq_window_mut_9mers start substring 
-------------------- ----- --------- 
ACGCTCTAAACACACAG    1     ACGCTCTAA
ACGCTCTAAACACACAG    2     CGCTCTAAA
ACGCTCTAAACACACAG    3     GCTCTAAAC
ACGCTCTAAACACACAG    4     CTCTAAACA
ACGCTCTAAACACACAG    5     TCTAAACAC
ACGCTCTAAACACACAG    6     CTAAACACA
ACGCTCTAAACACACAG    7     TAAACACAC
ACGCTCTAAACACACAG    8     AAACACACA
ACGCTCTAAACACACAG    9     AACACACAG


I would like to perform a join to have the following table on column seq_window_mut_9mers.

final_table

chrom strand ref_base alt_base pos      gene_ensembl_identifier   seq_window_mut_9mers  substring
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ -------------------- 
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     ACGCTCTAA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     CGCTCTAAA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     GCTCTAAAC
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     CTCTAAACA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     TCTAAACAC
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     CTAAACACA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     TAAACACAC
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     AAACACACA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     AACACACAG

I am running following postgres query through dbvisualizer. At the moment, the query is running very slow (still waiting for the output.. >10 mins).

SELECT 
chrom, strand, ref_base, alt_base, pos, gene_ensembl_identifier, mut.seq_window_mut_9mers substring 
FROM table1    
LEFT JOIN table2 ON mer9.seq_window_mut_9mers = table1.seq_window_mut_9mers;

How can I make it run faster? Any suggestions will be really helful.

Thanks

Upvotes: 0

Views: 189

Answers (1)

Tometzky
Tometzky

Reputation: 23920

It looks like you don't really need to join with table2. You can generate it on the fly with substring function, like this:

select
  table1.*,
  offsets.start,
  substring(seq_window_mut_9mers from offsets.start for 9) as substring
from
  table1,
  (select generate_series(1,9) as start) as offsets;

It will be much faster that a join.

Upvotes: 3

Related Questions