Reputation: 23
I have this table called myTable and I would like to select all rows in which B = DFW, and also all rows in which B = the value of C when B = DFW.
This is what I have so far but for some reason it only returns rows where B = DFW:
SELECT * from myTable AS t1 WHERE EXISTS (
SELECT C AS c_val from myTable WHERE t1.B="DFW" or (t1.B="DFW" and t1.C=c_val)
)
It should return the row where B = DFW, and where B = DTW as per what I said above.
Upvotes: 0
Views: 323
Reputation: 26324
I like the IN
version in Shawn's answer.
Generally, one can rewrite IN
expressions to an expression involving EXISTS
. I will cover both versions here and explain the differences.
Firstly, some preamble, here's the DDL for your dataset:
drop table if exists myTable;
create table myTable (A text, B text, C text);
insert into myTable values ('WN', 'HOU', 'TUL');
insert into myTable values ('AA', 'MCO', 'ORD');
insert into myTable values ('AA', 'DFW', 'DTW');
insert into myTable values ('AA', 'ORD', 'DFW');
insert into myTable values ('F9', 'MCO', 'DTW');
insert into myTable values ('UA', 'LAS', 'LAX');
insert into myTable values ('WN', 'PHX', 'OAK');
insert into myTable values ('OO', 'DTW', 'ESC');
insert into myTable values ('AA', 'LGA', 'DCA');
insert into myTable values ('AA', 'LAX', 'DFW');
create index ix_myTable_001 on myTable (b);
create index ix_myTable_002 on myTable (c);
Here's the IN
version of the query along with it's execution plan:
SELECT t.a, t.b, t.c
FROM mytable AS t
WHERE t.b = 'DFW' OR t.b IN (SELECT t2.c FROM mytable AS t2 WHERE t2.b = 'DFW');
explain query plan
SELECT t.a, t.b, t.c
FROM mytable AS t
WHERE t.b = 'DFW' OR t.b IN (SELECT t2.c FROM mytable AS t2 WHERE t2.b = 'DFW');
AA|DFW|DTW
OO|DTW|ESC
0|0|0|SEARCH TABLE mytable AS t USING INDEX ix_myTable_001 (b=?)
0|0|0|EXECUTE LIST SUBQUERY 1
1|0|0|SEARCH TABLE mytable AS t2 USING INDEX ix_myTable_001 (b=?)
0|0|0|SEARCH TABLE mytable AS t USING INDEX ix_myTable_001 (b=?)
Here is the EXISTS
version of the query:
select * from myTable T1
where T1.b = 'DFW' OR exists (
SELECT 1
FROM myTable T2
WHERE T2.B = 'DFW'
AND T1.B = T2.C);
explain query plan
select * from myTable T1
where T1.b = 'DFW' OR exists (
SELECT 1
FROM myTable T2
WHERE T2.B = 'DFW'
AND T1.B = T2.C);
AA|DFW|DTW
OO|DTW|ESC
0|0|0|SCAN TABLE myTable AS T1
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE myTable AS T2 USING INDEX ix_myTable_002 (c=?)
Summarizing: IN
expressions forces the subquery to be fully evaluated first before the outer query executes. EXISTS
expressions sometimes both the outer query and the subquery can be evaluated concurrently. This means more flexibility in the execution and potentially savings in execution and memory usage.
Deciding on whether you choose the IN
version vs the EXISTS
version is whether the subquery would generate a small or large result. If it's small go for the IN
version. If it's large, then go for the EXISTS
version.
Upvotes: 0
Reputation: 52614
A UNION with a CTE is one way:
WITH dfws(a, b, c) AS (SELECT a, b, c FROM myTable WHERE b = 'DFW')
SELECT a, b, c FROM dfws
UNION
SELECT t.a, t.b, t.c
FROM myTable AS t
JOIN dfws AS d ON t.b = d.c;
Basically, you use a CTE (Common Table Expression, kind of like a view that's local to a single query) to define dfws
as all the rows where b
is 'DFW'. You then return those rows, as well as the results of an inner join of dfws
against your original full table on the rows where b
equals dfws.c
.
Edit:
Another option is to use the IN
operator:
SELECT t.a, t.b, t.c
FROM mytable AS t
WHERE t.b = 'DFW' OR t.b IN (SELECT t2.c FROM mytable AS t2 WHERE t2.b = 'DFW');
After a quick glance at the respective query plans, I'm not sure which would be more efficient when dealing with big tables.
For best results with either you'll want an index on column b, though.
Upvotes: 2