mcgovec9
mcgovec9

Reputation: 71

CTE Row_Number partition doesn't remove all duplicates

I have the following requirement to remove any duplicate zero value rows, where there exists a matching row with a non zero value. The match case is done on all the other columns.

for example,

values  |   type    | month_year |   cli_name  |  media  |  ordering
50      | Revenue   |  Nov_20    |   google    | agency  |  sample
0       | Revenue   |  Nov_20    |   google    | agency  |  sample

I want the zero row here removed, because there is a matching row on the type,month_year,cli_name,media,ordering columns, and the values column is not zero.

I have the following cte function in order to do this,

WITH CTE AS(
SELECT a.*,ROW_NUMBER() OVER (PARTITION BY a.type, a.month_year, a.cli_name, a.media, a.ordering
                              ORDER BY  a.type, a.month_year, a.cli_name, a.media, a.ordering)as RN
 FROM mytable a
 )
DELETE FROM CTE WHERE RN > 1 and [values] = 0

The behaviour of the function is quite strange. When first run, it only removes a portion of the duplicates. I need to run it a couple more times, and each time, it picks up more duplicates and removes them, until I run it (in my scenario, the 4th time) and it doesn't remove anymore. The problem here is that duplicate zero value rows still remain in the table.

I know this is the case because running the below query still returns results after the cte function has stopped finding/removing duplicates:

SELECT DISTINCT b.[values],a.[values], b.[type], b.month_year, b.cli_name, b.media, b.ordering 
FROM mytable a
JOIN mytable   b    on b.[type] = a.[type]
                    and b.month_year= a.month_year
                    and b.cli_name= a.cli_name
                    and b.media = a.media
                    and b.ordering= a.ordering
  where a.[values] <> 0 and b.[values] = 0

So i've 2 questions,

  1. Why do I need to run it more than once for it to find more duplicates?
  2. Why does it not find all duplicates, no matter how many times its run?

I assume there's a flaw with the cte function but i'm struggling to spot it, appreciate any help!

Upvotes: 0

Views: 731

Answers (1)

jim
jim

Reputation: 456

The ORDER BY allows zero values in the first position. These values will not be deleted (WHERE RN > 1 and [values] = 0). It could take several iterations before the non-zero number reaches the first position

Change ORDER BY to a.value DESC

The above assumes that non-zero values are positive numbers (as shown). IF a non-zero [value] can be positive or negative, you will need to order by the absolute value: ABS(a.value)

Upvotes: 1

Related Questions