Marcus K
Marcus K

Reputation: 779

T-SQL SELECT TOP returns duplicates

I'm using SQL Server 2008 R2.

I'm not sure if I've discovered a strange SQL quirk, or (more likely) something in my code is causing this strange behaviour, particularly as Google has turned up nothing. I have a view called vwResponsible_Office_Address.

SELECT * FROM vwResponsible_Office_Address

..returns 403 rows

This code:

SELECT  TOP 1000 *  FROM vwResponsible_Office_Address

..returns 409 rows, as it includes 6 duplicates.

However this:

SELECT  TOP 1000 *  FROM vwResponsible_Office_Address
ORDER BY ID

..returns 403 rows again.

I can post the code for the view if it's relevant, but does it make sense for SELECT TOP to ever work in this way? I understand that SELECT TOP is free to return records in any order but don't understand why the number of records returned should vary.

The view does use cross apply which might be affecting the result set some how?

EDIT: View definition as requested

CREATE VIEW [dbo].[vwResponsible_Office_Address] 
AS
    SELECT  fp.Entity_ID [Reg_Office_Entity_ID],
            fp.Entity_Name [Reg_Office_Entity_Name],
            addr.Address_ID
    FROM [dbo].[Entity_Relationship] er
    INNER JOIN [dbo].[Entity] fp
        ON er.[Related_Entity_ID] = fp.[Entity_ID]
    INNER JOIN [dbo].[Entity_Address] ea
        ON ea.[Entity_ID] = fp.[Entity_ID]
    CROSS APPLY (
        SELECT TOP 1 Address_ID 
        FROM [dbo].[vwEntity_Address] vea   
        WHERE [vea].[Entity_ID] = fp.Entity_ID  
        ORDER by ea.[Address_Type_ID] ASC, ea.[Address_ID] DESC
    ) addr
    WHERE [Entity_Relationship_Type_ID] = 25 -- fee payment relationship

    UNION

    SELECT  ets.[Entity_ID],
            ets.[Entity_Name],
            addr.[Address_ID]
    FROM  dbo.[vwEntity_Entitlement_Status] ets
    INNER JOIN dbo.[Entity_Address] ea
        ON ea.[Entity_ID] = ets.[Entity_ID]
    CROSS APPLY (
        SELECT TOP 1 [Address_ID]
        FROM  [dbo].[vwEntity_Address] vea
        WHERE vea.[Entity_ID] = ets.[Entity_ID]
        ORDER by ea.[Address_Type_ID] ASC, ea.[Address_ID] DESC
    ) addr
    WHERE ets.[Entitlement_Type_ID] = 40 -- registered office
    AND ets.[Entitlement_Status_ID] = 11 -- active

Upvotes: 4

Views: 1312

Answers (2)

jmoreno
jmoreno

Reputation: 13561

I was wondering if your view included a function, until I got to the end, where you say you use cross-apply. I would assume that is your problem, if your interested in the details, take a look at the various query plans.

EDIT: Expansion of answer I.e. your function is non-deterministic and can either return more than one row per input or return the same row for different input. In combination, this means that you'll get exactly what you are seeing: duplicate rows under some circumsntaces. Adding a distinct to your view is the costly way to solve your problem, a better way would be to change your function so that for any input there is only one row output, and for a row output only one input will produce that row.

EDIT: I didn't see that you're now including your view definition. Your problem is definitely the cross apply, in particular you are sorting inside the cross apply by values from OUTSIDE of the cross apply, making the top 1 effectively random.

Upvotes: 2

Martin Smith
Martin Smith

Reputation: 453067

I would assume that there is some non determinism going on which means that different access methods can return different results.

Looking at the view definition the only place that appears likely would be if vwEntity_Address has some duplicates for Entity_ID.

This would make the top 1 Address_ID returned arbitrary in that case which will effect the result of the union operation when it removes duplicates.

Definitely this does look extremely suspect

SELECT TOP 1 [Address_ID]
        FROM  [dbo].[vwEntity_Address] vea
        WHERE vea.[Entity_ID] = ets.[Entity_ID]
        ORDER by ea.[Address_Type_ID] ASC, ea.[Address_ID] DESC

You are ordering by values from the outer query in the cross apply. This will have absolutely no effect whatsoever as these will be constant for a particular CROSS APPLY invocation.

Can you try changing to

SELECT TOP 1 [Address_ID]
        FROM  [dbo].[vwEntity_Address] vea
        WHERE vea.[Entity_ID] = ets.[Entity_ID]
        ORDER by vea.[Address_ID] DESC

Upvotes: 3

Related Questions