Reputation: 1475
I have following tables with Table1 with rows: >25million rows
Table1:
chrom strand ref_base alt_base pos gene_ensembl_identifier seq_window_9mers mutated_base seq_window_mut_9mers
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ --------------------
3 1 C T 40457498 ENSG00000168032 ACGCTCTACACACACAG A ACGCTCTAAACACACAG
Table2
seq_window_mut_9mers start substring
-------------------- ----- ---------
ACGCTCTAAACACACAG 1 ACGCTCTAA
ACGCTCTAAACACACAG 2 CGCTCTAAA
ACGCTCTAAACACACAG 3 GCTCTAAAC
ACGCTCTAAACACACAG 4 CTCTAAACA
ACGCTCTAAACACACAG 5 TCTAAACAC
ACGCTCTAAACACACAG 6 CTAAACACA
ACGCTCTAAACACACAG 7 TAAACACAC
ACGCTCTAAACACACAG 8 AAACACACA
ACGCTCTAAACACACAG 9 AACACACAG
I would like to perform a join to have the following table on column seq_window_mut_9mers.
final_table
chrom strand ref_base alt_base pos gene_ensembl_identifier seq_window_mut_9mers substring
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ --------------------
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG ACGCTCTAA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG CGCTCTAAA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG GCTCTAAAC
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG CTCTAAACA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG TCTAAACAC
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG CTAAACACA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG TAAACACAC
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG AAACACACA
3 1 C T 40457498 ENSG00000168032 ACGCTCTAAACACACAG AACACACAG
I am running following postgres query through dbvisualizer. At the moment, the query is running very slow (still waiting for the output.. >10 mins).
SELECT
chrom, strand, ref_base, alt_base, pos, gene_ensembl_identifier, mut.seq_window_mut_9mers substring
FROM table1
LEFT JOIN table2 ON mer9.seq_window_mut_9mers = table1.seq_window_mut_9mers;
How can I make it run faster? Any suggestions will be really helful.
Thanks
Upvotes: 0
Views: 189
Reputation: 23920
It looks like you don't really need to join with table2. You can generate it on the fly with substring function, like this:
select
table1.*,
offsets.start,
substring(seq_window_mut_9mers from offsets.start for 9) as substring
from
table1,
(select generate_series(1,9) as start) as offsets;
It will be much faster that a join.
Upvotes: 3