Chris Tsongas
Chris Tsongas

Reputation: 33

Removing duplicate records before numbering rows in result

So, I know just enough SQL to be dangerous and am following an example to pull one page's worth of records from a table:

SELECT TOP #arguments.perPage# * FROM (

SELECT DISTINCT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS rownum, productdiagramparts.productdiagramid AS productdiagramid, products.id AS productid, products.title AS producttitle, totalRows = COUNT(*) OVER()
FROM manufacturers
INNER JOIN products ON manufacturers.id = products.manufacturerid
INNER JOIN productdiagramparts ON products.id = productdiagramparts.productid
INNER JOIN productdiagrams ON productdiagramparts.productdiagramid = productdiagrams.id
WHERE #whereClause#
) _tmpInlineView

WHERE rownum > #offset#
ORDER BY producttitle

The SELECT TOP wrapped around this of course pulls just the records for the current page. Trouble is, there are duplicates in the innermost SELECT statement I want to remove, but using DISTINCT doesn't work as shown above because the rows have already been numbered for the outer query. How can I make my innermost SELECT results distinct before numbering the rows?

Here's the solution based on the accepted answer below:

SELECT TOP #arguments.perPage# * FROM (

SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS rownum, productdiagramid, productid, producttitle, totalRows = COUNT(*) OVER() FROM (

SELECT DISTINCT productdiagramparts.productdiagramid AS productdiagramid, products.id AS productid, products.title AS producttitle FROM manufacturers INNER JOIN products ON manufacturers.id = products.manufacturerid INNER JOIN productdiagramparts ON products.id = productdiagramparts.productid INNER JOIN productdiagrams ON productdiagramparts.productdiagramid = productdiagrams.id WHERE #whereClause#

) _tmpDupRemove

) _tmpInlineView

WHERE rownum > #offset# ORDER BY producttitle

Upvotes: 3

Views: 196

Answers (3)

Oleksandr Fedorenko
Oleksandr Fedorenko

Reputation: 16904

You can use GROUP BY clause.

Order in SQLServer: GROUP BY-->SELECT ROW_NUMBER()-->DISTINCT. Therefore GROUP BY instead of DISTINCT is necessary to you

SELECT TOP #arguments.perPage# * FROM (
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS rownum, 
       productdiagramparts.productdiagramid AS productdiagramid, 
       products.id AS productid, 
       products.title AS producttitle, 
       totalRows = COUNT(*) OVER()
FROM manufacturers
INNER JOIN products ON manufacturers.id = products.manufacturerid
INNER JOIN productdiagramparts ON products.id = productdiagramparts.productid
INNER JOIN productdiagrams ON productdiagramparts.productdiagramid = productdiagrams.id
GROUP BY productdiagramparts.productdiagramid, products.id, products.title
WHERE #whereClause#
) _tmpInlineView
WHERE rownum > #offset#
ORDER BY producttitle

Upvotes: 0

A Aiston
A Aiston

Reputation: 717

DECLARE @startrow Int = 0
DECLARE @recsperpage Int = 10


SELECT  [Inner2].[productdiagramid], 
        [Inner2].[productid], 
        [Inner2].[producttitle]
FROM        
    (
    SELECT ROW_NUMBER() OVER 
        (ORDER BY  
              [Inner1].[producttitle]) AS [ROW_NUMBER], 
        [Inner1].[productdiagramid], 
        [Inner1].[productid], 
        [Inner1].[producttitle]
    FROM
        (
        SELECT DISTINCT pdp.productdiagramid, p.id as productid, p.title as producttitle
        FROM manufacturers m
        INNER JOIN products p ON m.id = p.manufacturerid
        INNER JOIN productdiagramparts pdp ON p.id = pdp.productid
        INNER JOIN productdiagrams pd ON pdp.productdiagramid = pd.id
        WHERE #whereclause#
        ) AS Inner1
    ) as Inner2
WHERE Inner2.[ROW_NUMBER] BETWEEN @startrow + 1 AND  @startrow + @recsperpage
ORDER BY [Inner2].[ROW_NUMBER]

Upvotes: 0

bobs
bobs

Reputation: 22194

Here's an approach that uses the ROW_NUMBER function a second time. The inner-most SELECT assigns row-numbers based on the groups of duplicates. Then, only the rows with a row-number of 1 are returned to remove the duplicates. And, finally, the process is done to assign a row number.

SELECT TOP #arguments.perPage# *
FROM (

    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS rownum,
        productdiagramparts.productdiagramid AS productdiagramid, 
        products.id AS productid, products.title AS producttitle, totalRows = COUNT(*) OVER()
    FROM (

        SELECT ROW_NUMBER() OVER PARTITION BY productdiagramparts.productdiagramid, products.id ORDER BY (SELECT 1)) AS dup_sequence, *
        FROM manufacturers
        INNER JOIN products ON manufacturers.id = products.manufacturerid
        INNER JOIN productdiagramparts ON products.id = productdiagramparts.productid
        INNER JOIN productdiagrams ON productdiagramparts.productdiagramid = productdiagrams.id
        WHERE #whereClause#

        ) _tmpDupRemove
    WHERE dup_sequence = 1

    ) _tmpInlineView

WHERE rownum > #offset#
ORDER BY producttitle

Upvotes: 1

Related Questions