tarzanbappa
tarzanbappa

Reputation: 4958

SQL Server using CROSS APPLY to avoid duplicate records in SELECT statement

I have a table where there is no primary key or any unique column(No any auto increment column).

And I have to select records from that table avoiding duplicate records. I have used ROW_NUMBER() function for that and it's working for me.

Here is the working code

     WITH Result AS
        (
        SELECT ISNULL(ROW_NUMBER() OVER (ORDER BY 
                                          CategoryName
                                         ,ProductGroup
                                         ,Brand
                                         ,Division 
                                         ,ProductNo                                                                                                                     
                                         ,LocationNo                                                                                         
                                         ,Season                                                                                                
                                         ,Year                                                                                          
                                         ,Month                                                                                         
                                         ,Week                                                                                          
                                         ,Day                                                                                           
                                         ,DemandQty
                                      ), 0) AS RN, * 
        FROM TABLE_1
   )
SELECT * 
FROM Result AS D

WHERE RN IN
 (
    SELECT MAX(RN) MAX_RN
    FROM Result 
    GROUP BY
   CategoryName
  ,ProductGroup
  ,Brand
  ,Division 
  ,ProductNo                                                                                                                    
  ,LocationNo                                                                                        
  ,Season                                                                                               
  ,Year                                                                                         
  ,Month                                                                                            
  ,Week                                                                                         
  ,Day                                                                                          
  ,DemandQty
)   

This gives the result by avoiding duplicate records.

What I need to do is without writing this WITH clause I need to write it using CROSS APPLY because it'll take too much time to select records when there are about 500000 records with the Row Number

So far I tried to re-write the query just like below to avoid duplicate records.

SELECT * FROM 
TABLE_1 d
CROSS APPLY (
SELECT TOP 1 * FROM
TABLE_1 x
WHERE x.[Season] = d.Season
AND ISNULL(x.[Brand]            ,'')= ISNULL(d.[Brand]            ,'')
AND ISNULL(x.[Division]         ,'')= ISNULL(d.[Division]         ,'')
AND ISNULL(x.[CategoryName]     ,'')= ISNULL(d.[CategoryName]     ,'')
AND ISNULL(x.[ProductGroup]     ,'')= ISNULL(d.[ProductGroup]     ,'')
AND ISNULL(x.[ProductNo]        ,'')= ISNULL(d.[ProductNo]        ,'')
AND ISNULL(x.[LocationNo]       ,'')= ISNULL(d.[LocationNo]       ,'')
AND ISNULL(x.[Year]             ,'')= ISNULL(d.[Year]             ,'')
AND ISNULL(x.[Month]            ,'')= ISNULL(d.[Month]            ,'')
AND ISNULL(x.[Week]             ,'')= ISNULL(d.[Week]             ,'')
AND ISNULL(x.[Day]              ,'')= ISNULL(d.[Day]              ,'')
AND ISNULL(x.[DemandQty]        ,'')= ISNULL(d.[DemandQty]        ,'')
AND ISNULL(x.[DeletionIndicator],'')= ISNULL(d.[DeletionIndicator],'')
AND ISNULL(x.[CreatedBy]        ,'')= ISNULL(d.[CreatedBy]        ,'')
AND ISNULL(x.[CreatedOn]        ,'')= ISNULL(d.[CreatedOn]        ,'')
AND ISNULL(x.[UpdatedBy]        ,'')= ISNULL(d.[UpdatedBy]        ,'')
AND ISNULL(x.[UpdatedOn]        ,'')= ISNULL(d.[UpdatedOn]        ,'')
) y

But it's not giving the correct result for me still duplicate records are coming?

Assume that there are 10 identical rows along with other rows, I need to retrieve only one record from those 10 identical records along with other rows that are not duplicated.

Is there any way to do this without using ROW_NUMBER and WITH ?

Upvotes: 1

Views: 6484

Answers (1)

Lukasz Szozda
Lukasz Szozda

Reputation: 175826

To avoid duplicates you can use DISTINCT:

SELECT DISTINCT 
      CategoryName
     ,ProductGroup
     ,Brand
     ,Division 
     ,ProductNo         
     ,LocationNo                 
     ,Season                        
     ,Year                  
     ,Month                 
     ,Week                  
     ,Day                   
    ,DemandQty
FROM TABLE_1;

Or GROUP BY:

 SELECT
      CategoryName
     ,ProductGroup
     ,Brand
     ,Division 
     ,ProductNo         
     ,LocationNo                 
     ,Season                        
     ,Year                  
     ,Month                 
     ,Week                  
     ,Day                   
    ,DemandQty
FROM TABLE_1
GROUP BY
      CategoryName
     ,ProductGroup
     ,Brand
     ,Division 
     ,ProductNo         
     ,LocationNo                 
     ,Season                        
     ,Year                  
     ,Month                 
     ,Week                  
     ,Day                   
     ,DemandQty;

For comparing values which can contain NULL instead of

AND ISNULL(x.[Brand],'')= ISNULL(d.[Brand],'') you can use:

WHERE EXISTS (
  SELECT
      x.CategoryName
     ,x.ProductGroup
     ,x.Brand
     ,x.Division 
     ,x.ProductNo         
     ,x.LocationNo                 
     ,x.Season                        
     ,x.Year                  
     ,x.Month                 
     ,x.Week                  
     ,x.Day                   
     ,x.DemandQty
  INTERSECT
  SELECT
      d.CategoryName
     ,d.ProductGroup
     ,d.Brand
     ,d.Division 
     ,d.ProductNo         
     ,d.LocationNo                 
     ,d.Season                        
     ,d.Year                  
     ,d.Month                 
     ,d.Week                  
     ,d.Day                   
     ,d.DemandQty
)

Upvotes: 1

Related Questions