Reputation: 2864
I am having two pretty large schema identical tables in MSSQL (100 million+ of records) and I need to create a SP that returns a unioned result set with a query such as the one defined below. The result can in some cases be just a few records and in other cases many millions depending on the input to the SP.
I also need to sort it on the "S"-value which is a Guid-like value i.e. non sequential.
When loading from one table alone (i.e. without UNION ALL) the sort can take advantage of the index and sort efficiently, but how is that affected by a UNION ALL?
The client application consuming the values want to have a stream of data i.e. we don't want to wait for all records to be read/loaded to be able to return the first set of "S"-matches.
-- Creates #distinctMatches temporary table etc. above..
CREATE CLUSTERED INDEX idx ON #distinctMatches (s)
SELECT
'C' AS [source]
,P.[Id]
,P.[A]
,P.[B]
,P.[C]
,P.[D]
,P.[E]
,C.[S] AS [sortValue]
FROM
[dbo].[data_current] AS P
INNER JOIN #distinctMatches AS C ON P.[s] = C.[s]
UNION ALL
SELECT
'A' AS [source]
,P.[Id]
,P.[A]
,P.[B]
,P.[C]
,P.[D]
,P.[E]
,C.[S] AS [sortValue]
FROM
[dbo].[data_archive] AS P
INNER JOIN #distinctMatches AS C ON P.[s] = C.[s]
ORDER BY [sortValue]
I have not been able to verify if this order by operation can take advantage of the index or not?
I have tried to read the query plans but fail to interpret them around this. Any ideas or alternative suggestions?
This is a legacy application so big schema/data changes or other major redesigns of the architecture is not possible at this point.
UPDATE based information from the answer below:
By changing
FROM
[dbo].[data_archive] AS P
INNER JOIN #distinctMatches AS C ON P.[s] = C.[s]
ORDER BY [sortValue]
to
FROM
[dbo].[data_archive] AS P
WHERE EXISTS (SELECT C.[s] FROM #distinctMatches AS C WHERE P.[s] = C.[s])
ORDER BY [sortValue]
I was able to get a merge join instead of a concatenation. (See answer below).
Upvotes: 2
Views: 871
Reputation: 8687
You said "When loading from one table alone (i.e. without UNION ALL) the sort can take advantage of the index and sort efficiently" so there is a PK on the table with leading key column S or at least a covering index with leading key column S, and this is true for both tables. This means all joins in your query are MERGE joins (the join to temporary table is on the sorting column as well), so there will not be any additional sort in the plan
Upvotes: 1