AidanO
AidanO

Reputation: 878

Unexpected results using UNION

I'm writing a stored procedure to populate a multi-select control and I want to pull back at least 100 options from a database table, but I want to ensure that I bring back all the options the user has already selected.

The existing query brought back only the first 100 rows:

SELECT TOP 100
    t2.Id,
    t2.value2
FROM
    table1 t1
INNER JOIN
    table2 t2 
  ON
    t1.clientId = t2.clientId AND t1.Id = t2.Id
WHERE
    t1.clientId = 1

I've added a UNION to this query to ensure I always get the selected options back as well:

SELECT TOP 100
        t2.Id,
        t2.value2
  FROM
        table1 t1
INNER JOIN
        table2 t2 ON
        t1.clientId = t2.clientId AND t1.Id = t2.Id
 WHERE
        t1.clientId = 1
 UNION 
SELECT  t2.id, 
        t_si.externalCertificateNumber 
  FROM  table2 t2
INNER JOIN 
        Table3 t3 
    ON t2.value = 333

In one of my test examples, there were 4 previously selected options. 3 of these would have returned in the top 100 (1 was not, hence the need to update).

I expected to get back 101 rows and was surprised to get back 104. I figured this was down to the way the DISTINCT inherent in UNION is working.

However what I am also seeing is different results in the 100 returned. With some, (more than 3) different values showing up. And some that were gone in the previous TOP 100!

So my question is, does UNION have some kind of in built sort that would be effecting the results? Or is there something else going on here?

-----------------------------EDIT------------------------------

If I changed the above queries to TOP 5:

And the previously selected values are:

1, Blue
2, Green
20, Violet
100 Indigo

The result for the SELECT on its own would look like this:

1, Blue
2, Green
7, Red
15, Cyan
20, Violet

The result for the SELECT and the UNION

1, Blue
2, Green
33, Cyan
20, Violet
24, Yellow
100, Indigo
60, Aero
40, Amber
25, Black

This is a made up example not from the DB, but note that the Red in the SELECT is missing from the SELECT and UNION

Upvotes: 1

Views: 592

Answers (2)

Vladimir Baranov
Vladimir Baranov

Reputation: 32703

I'm starting to suspect that adding the UNION has changed the execution plan, and since there is no order explicitly set, that this is affecting what's coming back.

You are correct.

TOP 100 without ORDER BY returns some 100 rows. Which rows are returned is not defined.

The first version of the query without UNION returned one set of 100 rows, the second version of the query with UNION returned another set of 100 rows.

To get predictable and expected result you should add ORDER BY to the query that uses TOP. Ideally the sorting should be unambiguous.

Yes, UNION does have a "built-in" sort. If you look at the execution plan you would usually see it.

I prefer to use Common-Table Expressions (CTE) to write complex queries, like this:

WITH
CTE
AS
(
    SELECT TOP 100
        t2.Id,
        t2.value2
    FROM
        table1 t1
        INNER JOIN table2 t2 
            ON t1.clientId = t2.clientId 
            AND t1.Id = t2.Id
    WHERE
        t1.clientId = 1
    ORDER BY
        t2.Id
)
SELECT
    Id,
    value2
FROM
    CTE

UNION 

SELECT
    t2.id,
    t_si.externalCertificateNumber 
FROM  
    table2 t2
    INNER JOIN Table3 t3 ON t2.value = 333
;

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522817

The UNION operator removes duplicate records. To retain all records use UNION ALL:

SELECT TOP 100
    t2.Id,
    t2.value2
FROM
    table1 t1
    INNER JOIN
    table2 t2 ON
        t1.clientId = t2.clientId AND
        t1.Id = t2.Id
WHERE
    t1.clientId = 1
UNION ALL
SELECT t2.id, t_si.externalCertificateNumber FROM table2 t2
    INNER JOIN Table3 t3 
    ON t2.value = 333

The missing records from the TOP 100 query can easily be explained by the UNION, and also the differing number of records in your result set.

Upvotes: 2

Related Questions