Bink
Bink

Reputation: 2171

Convert Subquery to Self Join

New to SQL and I understand joins tend to be faster than subqueries. I have the following table, and my current query gives me the results I require, but I can’t wrap my head around a similar query that uses self join instead, assuming it’s possible.

Table

id           scheduled_id action_id
------------ ------------ ------------
1            1            1
2            1            2
3            1            3
4            2            1
5            2            2
6            3            1

Schema

create table ma (
id integer primary key,
scheduled_id integer,
action_id integer
);

insert into ma (
id,
scheduled_id,
action_id
)
values
(1, 1, 1),
(2, 1, 2),
(3, 1, 3),
(4, 2, 1),
(5, 2, 2),
(6, 3, 1);

Query

select * from ma where action_id = 3
union all
select * from ma where scheduled_id not in (
  select scheduled_id from ma
  where action_id = 3)

Result

id           scheduled_id action_id
------------ ------------ ------------
3            1            3
4            2            1
5            2            2
6            3            1

My result should be all the rows that have the action_id value of 3 plus all the rows for those scheduled_ids that do not have an action_id value of 3.

The sqlfiddle may be found at http://sqlfiddle.com/#!5/0ba51/3.

Thank you.

Upvotes: 2

Views: 678

Answers (6)

sandesh dahake
sandesh dahake

Reputation: 1026

How about this? Though not self join but faster than union

select * from ma
where action_id = 3 or scheduled_id not in (
    select scheduled_id from ma
    where action_id = 3
  )

Upvotes: 1

Niharika Bitra
Niharika Bitra

Reputation: 477

The result you are looking for using a self join is:

SELECT DISTINCT t1.*
FROM ma t1
JOIN ma t2  
ON  t1.SCHEDULED_ID <> t2.SCHEDULED_ID --Satisfies 2nd query
WHERE t2.ACTION_ID = 3 --Satisfies 2nd query
    OR  t1.ACTION_ID = 3 --Satisfies 1st query
ORDER BY t1.ID

Upvotes: 1

iftekhar
iftekhar

Reputation: 447

SELECT m1.* 
FROM ma m1
INNER JOIN
(
    SELECT * 
    FROM ma m2 
    WHERE m2.action_id = 3
) AS matbl 
WHERE m1.action_id = 3 
OR matbl.scheduled_id<>m1.scheduled_id

Hope it will help.

Upvotes: 1

LetEpsilonBeLessThanZero
LetEpsilonBeLessThanZero

Reputation: 2403

This code only works if your action_id's always go 1,2,3,4,etc and never skip over 3. I just wanted to provide an alternate answer in case adding in the concept of max(action_id) could be useful to you.

select ma.id
     , ma.scheduled_id
     , ma.action_id
     , ma_max.max_action_id
from (
    select scheduled_id
         , max(action_id) as max_action_id
    from ma
    group by scheduled_id
) ma_max
join ma
    on ma_max.scheduled_id = ma.scheduled_id 
where (action_id = 3 or max_action_id < 3)

It's almost certainly not going to perform as well as the other answers that are using "EXISTS". I just like how the complexity of the logic gets reduced down to essentially one easy to read line in where (action_id = 3 or max_action_id < 3).

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269583

My result should be all the scheduled_ids that have the value of 3 plus all the scheduled_ids and action_ids for those scheduled_ids that do not have a value of 3.

This is not what your query does. The query that does that is:

select ma.*
from ma
where exists (select 1
              from ma ma2
              where ma2.scheduled_id = ma.scheduled_id and
                    ma2.action_id = 3
             );

Although you could do this with a self-join, it is tricky, because the query can result in duplicates. I recommend exists or in for the logic.

Upvotes: 0

GMB
GMB

Reputation: 222432

I don’t think a JOIN is really what you need here. I would use the following query, which avoids UNION :

SELECT m.* 
FROM ma m
WHERE 
    m.action_id = 3
    OR NOT EXISTS (
        SELECT 1
        FROM ma m1
        WHERE 
            m1.scheduled_id = m.scheduled_id
            AND m1.action_id = 3
    )

When it comes to checking for the existence (or absence) of something, NOT EXISTS with a correlated subquery is usually the most relevant and efficient approach.

Upvotes: 1

Related Questions