rapto500134
rapto500134

Reputation: 23

Select a row and then more rows based on a value in that row

I have this table called myTable and I would like to select all rows in which B = DFW, and also all rows in which B = the value of C when B = DFW.

This is what I have so far but for some reason it only returns rows where B = DFW:

SELECT * from myTable AS t1 WHERE EXISTS (
 SELECT C AS c_val from myTable WHERE t1.B="DFW" or (t1.B="DFW" and t1.C=c_val)
)

It should return the row where B = DFW, and where B = DTW as per what I said above.

enter image description here

Upvotes: 0

Views: 323

Answers (2)

Stephen Quan
Stephen Quan

Reputation: 26324

I like the IN version in Shawn's answer.

Generally, one can rewrite IN expressions to an expression involving EXISTS. I will cover both versions here and explain the differences.

Firstly, some preamble, here's the DDL for your dataset:

drop table if exists myTable;

create table myTable (A text, B text, C text);
insert into myTable values ('WN', 'HOU', 'TUL');
insert into myTable values ('AA', 'MCO', 'ORD');
insert into myTable values ('AA', 'DFW', 'DTW');
insert into myTable values ('AA', 'ORD', 'DFW');
insert into myTable values ('F9', 'MCO', 'DTW');
insert into myTable values ('UA', 'LAS', 'LAX');
insert into myTable values ('WN', 'PHX', 'OAK');
insert into myTable values ('OO', 'DTW', 'ESC');
insert into myTable values ('AA', 'LGA', 'DCA');
insert into myTable values ('AA', 'LAX', 'DFW');

create index ix_myTable_001 on myTable (b);
create index ix_myTable_002 on myTable (c);

Here's the IN version of the query along with it's execution plan:

SELECT t.a, t.b, t.c
FROM mytable AS t
WHERE t.b = 'DFW' OR t.b IN (SELECT t2.c FROM mytable AS t2 WHERE t2.b = 'DFW');

explain query plan
SELECT t.a, t.b, t.c
FROM mytable AS t
WHERE t.b = 'DFW' OR t.b IN (SELECT t2.c FROM mytable AS t2 WHERE t2.b = 'DFW');

AA|DFW|DTW
OO|DTW|ESC

0|0|0|SEARCH TABLE mytable AS t USING INDEX ix_myTable_001 (b=?)
0|0|0|EXECUTE LIST SUBQUERY 1
1|0|0|SEARCH TABLE mytable AS t2 USING INDEX ix_myTable_001 (b=?)
0|0|0|SEARCH TABLE mytable AS t USING INDEX ix_myTable_001 (b=?)

Here is the EXISTS version of the query:

select * from myTable T1
where T1.b = 'DFW' OR exists (
SELECT 1
FROM myTable T2
WHERE T2.B = 'DFW'
AND T1.B = T2.C);

explain query plan
select * from myTable T1
where T1.b = 'DFW' OR exists (
SELECT 1
FROM myTable T2
WHERE T2.B = 'DFW'
AND T1.B = T2.C);

AA|DFW|DTW
OO|DTW|ESC

0|0|0|SCAN TABLE myTable AS T1
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE myTable AS T2 USING INDEX ix_myTable_002 (c=?)

Summarizing: IN expressions forces the subquery to be fully evaluated first before the outer query executes. EXISTS expressions sometimes both the outer query and the subquery can be evaluated concurrently. This means more flexibility in the execution and potentially savings in execution and memory usage.

Deciding on whether you choose the IN version vs the EXISTS version is whether the subquery would generate a small or large result. If it's small go for the IN version. If it's large, then go for the EXISTS version.

Upvotes: 0

Shawn
Shawn

Reputation: 52614

A UNION with a CTE is one way:

WITH dfws(a, b, c) AS (SELECT a, b, c FROM myTable WHERE b = 'DFW')
SELECT a, b, c FROM dfws
UNION
SELECT t.a, t.b, t.c
FROM myTable AS t
JOIN dfws AS d ON t.b = d.c;

Basically, you use a CTE (Common Table Expression, kind of like a view that's local to a single query) to define dfws as all the rows where b is 'DFW'. You then return those rows, as well as the results of an inner join of dfws against your original full table on the rows where b equals dfws.c.

Edit: Another option is to use the IN operator:

SELECT t.a, t.b, t.c
FROM mytable AS t
WHERE t.b = 'DFW' OR t.b IN (SELECT t2.c FROM mytable AS t2 WHERE t2.b = 'DFW');

After a quick glance at the respective query plans, I'm not sure which would be more efficient when dealing with big tables.

For best results with either you'll want an index on column b, though.

Upvotes: 2

Related Questions