Viplime
Viplime

Reputation: 141

More Efficient Way For SQL IN Clause

I have this sql query and this takes very long time to finish. How can i speed this up?

t_inter_specises_interaction has 60k rows and t_pathway has 100k. uniprot_id_1,uniprot_id_2,uniprot_id are varchar types.

In this query, i want to select uniprot_id_1 and uniprot_id_2 which are both present in t_pathway :

select distinct uniprot_id_1,uniprot_id_2 from t_intra_species_interaction
where uniprot_id_1 in (select uniprot_id from t_pathway) and
  uniprot_id_2 in (select uniprot_id from t_pathway)

In this one, i want to select uniprot_id which is present in the set uniprot_ids returned by the first query above.

select distinct uniprot_id,id from t_pathway as t
where uniprot_id in
(
    select distinct uniprot_id_2 from t_intra_species_interaction
    where uniprot_id_1 in (select uniprot_id from t_pathway) and
      uniprot_id_2 in (select uniprot_id from t_pathway)
    union
    select distinct uniprot_id_1 from t_intra_species_interaction
    where uniprot_id_1 in (select uniprot_id from t_pathway) and
      uniprot_id_2 in (select uniprot_id from t_pathway)
)

Thanks.

Upvotes: 1

Views: 361

Answers (6)

Serpiton
Serpiton

Reputation: 3684

The subqueries are identical, therefore they can be merged into one, and moved to a join

SELECT DISTINCT i.uniprot_id_1, i.uniprot_id_2 
FROM   t_intra_species_interaction i
       INNER JOIN t_pathway p ON p.uniprot_id IN (i.uniprot_id_1, i.uniprot_id_2)

Second query
It would have be better to open a new question referring to this, but looking at my previous query it should be easy to see that to get your second answer, you only need to get the column from t_pathway instead of t_intra_species_interaction

SELECT DISTINCT p.uniprot_id, p.id
FROM   t_intra_species_interaction i
       INNER JOIN t_pathway p ON p.uniprot_id IN (i.uniprot_id_1, i.uniprot_id_2)

Upvotes: 2

John Bupit
John Bupit

Reputation: 10618

How about this:

select distinct uniprot_id_1, uniprot_id_2
from t_intra_species_interaction
where exists (select uniprot_id from t_pathway
              where uniprot_id_1 = uniprot_id) and
      exists (select uniprot_id from t_pathway
              where uniprot_id_2 = uniprot_id)

Upvotes: 1

Hamlet Hakobyan
Hamlet Hakobyan

Reputation: 33381

Try this:

select distinct uniprot_id_1,uniprot_id_2
from t_intra_species_interaction I
  join t_pathway P1
    on I.uniprot_id_1 = P1.uniprot_id
  join t_pathway P2
    on I.uniprot_id_2 = P2.uniprot_id

or

select distinct uniprot_id_1,uniprot_id_2
from t_intra_species_interaction I
where exists (select 1 from t_pathway where uniprot_id = I.uniprot_id_1)
  and exists (select 1 from t_pathway where uniprot_id = I.uniprot_id_2)

Upvotes: 2

Susanna
Susanna

Reputation: 72

EXISTS or JOIN will be much more efficient.

Upvotes: 2

Reza
Reza

Reputation: 19843

There are a general guildline:

Create three indexes, one on t_pathway.uniport_id and one on t_intra_species_interaction.uniport_id1 and another on t_intra_species_interaction.uniport_id2

By this way all the data that you need is in your indexes, and it should be fast

Also converting your in clauses to Left join as Tomas mentioned in his answer

Upvotes: 1

Tomas Pastircak
Tomas Pastircak

Reputation: 2857

You may want to use INNER JOIN:

select distinct uniprot_id_1,uniprot_id_2 from t_intra_species_interaction i
inner join t_pathway p1
    on p1.uniprod_id = t.uniprot_id_1 
inner join t_pathway p2
    on p2.uniprod_id = t_uniprot_id_2

Upvotes: 2

Related Questions