littlegreen
littlegreen

Reputation: 7420

SQL JOIN with two or more tables as output - most efficient way?

I have an SQL query that executes a LEFT JOIN on another table, then outputs all results that could be coupled into a designated table. I then have a second SQL query that executes the LEFT JOIN again, then outputs the results that could not be coupled to a designated table. In code, this is something like:

INSERT INTO coupledrecords
SELECT b.col1, b.col2... s.col1, s.col2... FROM bigtable AS b
LEFT JOIN smallertable AS s
ON criterium
WHERE s.col1 IS NOT NULL

INSERT INTO notcoupledrecords
SELECT b.col1, b.col2... bigtable AS b
LEFT JOIN smallertable AS s
ON criterium
WHERE s.col1 IS NULL

My question: I now have to execute the JOIN two times, in order to achieve what I want. I have a feeling that this is twice as slow as it could be. Is this true, and if yes, is there a way to do it more efficiently?

Upvotes: 2

Views: 666

Answers (5)

Justin Grant
Justin Grant

Reputation: 46663

If you can change the clustered indexes of coupledrecords and notcoupledrecords to contain one column from smalltable (including tacking a computed bit column on the end of each clustered index, solely for this purpose-- see @Martin Smith's answer above for details) then you could use a Partitioned View for the insert. That's the easy possibility.

If that's not possible, then you can also try a non-partitioned view solution. See below-- it's more involved.

Without knowing how your data is distributed (e.g. row size, number of nullable vs. non-nullable columns, ratio of coupled to uncoupled) it's hard to recommend a general solution, but one solution which might work well in most cases is to use views to simulate coupled and uncoupled tables on top of a single "maybecoupled" table. Using views means your existing query code (other than the insert) won't have to change.

This seem horribly inefficient at first glance, but remember that nulls take up zero storage space and, with proper indexes, SQL will not waste much filtering out the "other view's" rows.

Here's how it would work:

  • INSERT all records into a the base table (e.g. maybecoupled) in a single pass
  • Ensure there's an index (ideally a clustered index, but nonclustered is OK too) on one of the columns you got from from smalltable. Let's assume this is indexedcol1
  • Create two views on top: coupledrecords and notcoupledrecords, whose definitions are SELECT col1, ... FROM maybecoupled WHERE indexedcol1 IS NULL and SELECT col1, ... FROM maybecoupled WHERE indexedcol1 IS NOT NULL.
  • if you have a clustered index on indexedcol1, then you will pay little or no perf penalty for most queries, since every query to either view will only hit the appropriate half of the records and never touch the other half. Your non-clustered indexes will get a little bigger and hence a little slower, but even that can be ameliorated with filtered indexes.
  • if you can't use a clustered index, make sure that indexcol1 is part of (or INCLUDE-d into) every non-clustered index. This prevents having to go back to the clustered index to look up indexcol1 for queries which are only pulling data from non-clustered indexes.

Here are a few cases where the above solution won't work:

  • if the number of coupled rows is relatively small and you either have a lot of non-nullable columns in bigtable or you have very small rows. Then the space overhead of all those non-coupled rows may hurt. (nulls don't take up space, but non-nullable columns do.)
  • if you are using a non-clustered indexcol1 and can't change your indexes to ensure indexcol1 is present in your non-clustered indexes
  • if the shenanigans above cause SQL Server to pick the wrong indexes to use in query plans due to increased complexity of queries (although you can fix this with index hints)

Caveat: you'll definitely want to test performance of any view-based soluton to make sure it doesn't make things worse-- SQL is usually good at picking good query plans, but not always. Test, Test, Test!

Upvotes: 2

Martin Smith
Martin Smith

Reputation: 452978

One way I think you could do it would be to create a Partitioned View with the check constraint on a column indicating coupled/not coupled. Then insert into the view and let SQL Server figure out the destination table. Not suggesting that you do do it just thought I'd mention it as a possibility!

INSERT INTO coupledrecordsView
SELECT case WHEN s.col1 IS NULL THEN 1 ELSE 0 END AS IsCoupled,
b.col1, b.col2... s.col1, s.col2... FROM bigtable AS b
LEFT JOIN smallertable AS s
ON criterium

Upvotes: 3

AdaTheDev
AdaTheDev

Reputation: 147224

If you're inserting the different results into 2 different tables, you will need the 2 different queries.

The only thing I'd suggest, is that the "coupledrecords" query can just be an INNER JOIN:

INSERT INTO coupledrecords
SELECT b.col1, b.col2... s.col1, s.col2... FROM bigtable AS b
INNER JOIN smallertable AS s 
ON criterium

If you were inserting into the SAME table, with a field to indicate whether or not it was a matched record or not, then yes you could do it as one query.

Upvotes: 4

Preet Sangha
Preet Sangha

Reputation: 65476

You're basically choping the data in two. Once You've done it once you have a set of keys in coupledrecords and should just do a NOT IN from that

INSERT INTO notcoupledrecords
SELECT b.col1, b.col2... bigtable AS b
WHERE some_col not in (select some_col from coupledrecords) 

Upvotes: 0

chris
chris

Reputation: 9993

the bottom line is, you need 2 queries because you're inserting into 2 different tables. if you only had one table instead of coupledrecords and notcoupledrecords then you could do it in 1 query. :)

Upvotes: 3

Related Questions