AlexGuevara
AlexGuevara

Reputation: 942

SQL Server query involving subqueries - performance issues

I have three tables:

Table 1: | dbo.pc_a21a22 |

    batchNbr                   Other columns...
    --------                   ----------------
      12345  
      12346
      12347

Table 2: | dbo.outcome |

   passageId                 record
   ----------               ---------
      00003                    200
      00003                     9
      00004                     7

Table 3: | dbo.passage |

   passageId                 passageTime        batchNbr
   ----------               -------------       ---------
      00001                   2015.01.01         12345
      00002                   2016.01.01         12345
      00003                   2017.01.01         12345
      00004                   2018.01.01         12346

What I want to do: for each batchNbr in Table 1 get first its latest passageTime and the corresponding passageID from Table 3. With that passageID, get the relevant rows in Table 2 and establish whether any of these rows contains the record 200. Per passageId there are at most 2 records in Table 2

What is the most efficient way to do this?

I have already created a query that works, but it's awfully slow and thus unfit for tables with millions of rows. Any suggestion on how to either change the query or do it another way? Altering the table structure is not an option, I only have read rights to the database.

My current solution (slow):

SELECT TOP 50000
    a.batchNbr,
    CAST ( CASE WHEN 200 in (SELECT TOP 2 record FROM dbo.outcome where passageId in (
    SELECT SubqueryResults.passageId From (SELECT Top 1 passageId FROM dbo.passage pass WHERE pass.batchNbr = a.batchNbr ORDER BY passageTime Desc) SubqueryResults 
    ) 
    ) then 1 else 0 end as bit) as KGT_IO_END

    FROM dbo.pc_a21a22 a

The desired output is:

batchNbr     200present
---------    ----------
12345            1
12346            0

Upvotes: 1

Views: 503

Answers (2)

CurseStacker
CurseStacker

Reputation: 1079

Try this

SELECT TOP 50000 a.*, (CASE WHEN b.record = 200 THEN 1 ELSE 0 END) AS 
KGT_IO_END
FROM dbo.Test1 AS a 
LEFT OUTER JOIN 
(SELECT record, p.batchNbr
 FROM dbo.Test2 AS o
 LEFT OUTER JOIN (SELECT MAX(passageId) AS passageId, batchNbr FROM 
 dbo.Test3 GROUP BY batchNbr) AS p ON o.passageId = p.passageId
) AS b ON a.batchNbr = b.batchNbr;

The MAX subquery is to get the latest passageId by batchNbr. However, your example won't get the record 200, since the passageId of the record with 200 is 00001, while the latest passageId of the batchNbr 12345 is 00003.

I used LEFT OUTER JOIN since the passageId from Table2 no longer match any of the latest passageId from Table3. The resulting subquery would have no records to join to Table1. Therefore INNER JOIN would not show any records from your example data.

Output from your example data:

batchNbr    KGT_IO_END
  12345         0
  12346         0
  12347         0

Output if we change the passageId of record 200 to 00003 (the latest for 12345)

batchNbr    KGT_IO_END
  12345         1
  12346         0
  12347         0

Upvotes: 1

onhax
onhax

Reputation: 339

I suggest you use table joining rather than subqueries.

select 
    a.*, b.*
from 
    dbo.table1 a 
join 
    dbo.table2 b on a.id = b.id 
where 
    /*your where clause for filtering*/

EDIT:

You could use this as a reference Join vs. sub-query

Upvotes: 3

Related Questions