Jon
Jon

Reputation: 1142

SQL: Optimize nested query with redundant union code?

Although the below query works perfectly fine for my uses, I am trying to understand if there is a way to optimize, since it uses the same nested subquery in both statements of the UNION. My instincts say there should be a way to look up the subquery once and use that result for both parts of the UNION, but I am running into syntactic issues when trying to use join across the UNIONs.

SELECT ClientLocalName, ClientLocalID, ClientMasterID, 1 HierarchyType
    FROM dbo.ClientLocal
    WHERE clientLocalID = 
        ANY (SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%0123456789')
UNION
SELECT ClientMasterName, '0', clientMasterID, 2
    FROM dbo.ClientMaster
    WHERE clientMasterID = 
        ANY (SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%0123456789')
ORDER BY HierarchyType ASC;

To help explain why the original query is necessary, imagine a concept of a "local" client and a "master" client -- the local client is tied to a master client, but the master client may or may not have any local clients. Likewise, the phone number associated with each should only be tied to either the local client or the master client (hence why in the second statement, a '0' is hardcoded for the local client ID -- because there is no association).

This would be running against Microsoft SQL Server 2008 R2 by the way.

Upvotes: 2

Views: 687

Answers (4)

deroby
deroby

Reputation: 6002

Since nobody seems to mention it: having less code does not necessarily mean that it will be faster in execution too. Too often people seem to think that CTE's are executed first (or "in order" in case there are multiple of them) and then the final query is applied; they are NOT. In fact, the server will pretty much handle the cte as if it was a derived table somewhere in the FROM or JOIN clause and as such pick exactly the same query plan to execute them. In order avoid the system to have to scan the ClientPhone table twice (LIKE '%0123...' is a very 'annoying' operation) you'd have to do it once upfront, store the result in a temp-table and then use said temp-table in your actual query.

To give some context to this, I took the liberty to create some sample-data (see code below). The results of my test were:

  • The original query takes 71ms
  • The CTE version takes 71ms and uses the EXACT same query plan
  • The creation of the temp-table takes 26ms, the query using it 23ms

So even though the server needs to create the temp-table and fill it, combined the latter still only takes 49ms instead of the original 71ms!

Off course your mileage may vary depending on the number of records involved and the complexity of the repeated query, like I said, LIKE '%blah' is an annoying thing to do because it requires a full scan of the table (or with a bit of luck of a (covering) index). If it had been WHERE pk_field = @value the effect might have been very different)

Happy querying...

IF DB_ID('test') IS NULL CREATE DATABASE test
GO
USE test
GO

-- setup
IF OBJECT_ID('ClientPhone') IS NOT NULL DROP TABLE ClientPhone 
IF OBJECT_ID('ClientLocal') IS NOT NULL DROP TABLE ClientLocal
IF OBJECT_ID('ClientMaster') IS NOT NULL DROP TABLE ClientMaster
GO
SELECT TOP 50000 
       HierarchyItem = IDENTITY(int, 1, 1),
       PhoneNumber   = Convert(varchar(100), NewID())
  INTO dbo.ClientPhone 
  FROM sys.objects x1, sys.columns x2, sys.objects x3, sys.columns x4

SELECT ClientLocalName = 'client dummy', 
       ClientLocalID   = Convert(int, Rand(HierarchyItem * 37) * 50000), 
       ClientMasterID   = Convert(int, Rand(HierarchyItem * 47) * 50000), 
       HierarchyType   = 1
  INTO dbo.ClientLocal
  FROM dbo.ClientPhone 

SELECT ClientMasterName = 'master dummy', 
       ClientLocalID    = Convert(int, Rand(HierarchyItem * 51) * 50000), 
       ClientMasterID   = Convert(int, Rand(HierarchyItem * 53) * 50000), 
       HierarchyType    = 2
  INTO dbo.ClientMaster
  FROM dbo.ClientPhone 

GO
-- original 
SELECT ClientLocalName, ClientLocalID, ClientMasterID, 1 HierarchyType
  FROM dbo.ClientLocal
 WHERE ClientLocalID = ANY (SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%01')
UNION
SELECT ClientMasterName, '0', ClientMasterID, 2
  FROM dbo.ClientMaster
 WHERE ClientMasterID = ANY (SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%01')
ORDER BY HierarchyType ASC;

-- CTE
;WITH cte
   AS (SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%01')

SELECT ClientLocalName, ClientLocalID, ClientMasterID, 1 HierarchyType
  FROM dbo.ClientLocal
 WHERE ClientLocalID = ANY (SELECT HierarchyItem FROM cte)
UNION
SELECT ClientMasterName, '0', ClientMasterID, 2
  FROM dbo.ClientMaster
 WHERE ClientMasterID = ANY (SELECT HierarchyItem FROM cte)
ORDER BY HierarchyType ASC;

-- temp table
IF OBJECT_ID('tempdb..#ClientPhone') IS NOT NULL DROP TABLE #ClientPhone

SELECT HierarchyItem INTO #ClientPhone FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%01'

SELECT ClientLocalName, ClientLocalID, ClientMasterID, 1 HierarchyType
  FROM dbo.ClientLocal
 WHERE ClientLocalID = ANY (SELECT HierarchyItem FROM #ClientPhone)
UNION
SELECT ClientMasterName, '0', ClientMasterID, 2
  FROM dbo.ClientMaster
 WHERE ClientMasterID = ANY (SELECT HierarchyItem FROM #ClientPhone)
ORDER BY HierarchyType ASC;

Upvotes: 1

Lukasz Szozda
Lukasz Szozda

Reputation: 176024

If you don't like repeating code you could use common table expression:

WITH cte AS (
  SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%0123456789'
)
SELECT ClientLocalName, ClientLocalID, ClientMasterID, 1 HierarchyType
FROM dbo.ClientLocal
WHERE clientLocalID IN (SELECT HierarchyItem FROM cte)
UNION
SELECT ClientMasterName, '0', clientMasterID, 2
FROM dbo.ClientMaster
WHERE clientMasterID IN (SELECT HierarchyItem FROM cte)
ORDER BY HierarchyType ASC;

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1270713

The only obvious optimization I can see is to use union all instead of union:

SELECT ClientLocalName, ClientLocalID, ClientMasterID, 1 HierarchyType
FROM dbo.ClientLocal
WHERE clientLocalID = ANY (SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%0123456789')
UNION ALL
SELECT ClientMasterName, '0', clientMasterID, 2
FROM dbo.ClientMaster
WHERE clientMasterID = ANY (SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%0123456789')
ORDER BY HierarchyType ASC;

You clearly have no duplicates between the sub-selects (because of the last column). This assumes you have no duplicates within each sub-select either.

I don't know about SQL Server's treatment of = ANY, but I assume it is basically the same as IN. The = ANY subquery would be run once (for each subquery), but the second time the results would be cached. You could simplify this to:

SELECT ClientLocalName, ClientLocalID, ClientMasterID, 1 HierarchyType
FROM ((SELECT ClientLocalName, ClientLocalID, ClientMasterID
       FROM dbo.ClientLocal
      ) UNION ALL
      (SELECT ClientLocalName, '0', ClientMasterID
       FROM dbo.ClientMasterName
      ) 
     ) c        
WHERE c.clientLocalID = ANY (SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%0123456789')
ORDER BY HierarchyType ASC;

I don't know if this would be much of an improvement.

The LIKE pattern has a wildcard at the beginning of the number, making that hard to optimize.

Upvotes: 0

DanB
DanB

Reputation: 2124

;with subQry as (
    SELECT ClientLocalName, ClientLocalID, ClientMasterID
        FROM dbo.ClientLocal
        WHERE clientLocalID = 
            ANY (SELECT HierarchyItem FROM dbo.ClientPhone WHERE PhoneNumber LIKE '%0123456789')
)
SELECT ClientLocalName, ClientLocalID, ClientMasterID, 1 HierarchyType
    FROM subQry
UNION
SELECT ClientMasterName, '0', clientMasterID, 2
    FROM subQry
ORDER BY HierarchyType ASC;

Upvotes: 0

Related Questions