tdtm
tdtm

Reputation: 336

Identify, delete duplicates

I have to clean a database that ended with duplicates due to incorrect application code.

To get necessary data, I am joining tables containing quiz users, questions and answers. This gives me:

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated             | MaxAnswers
--------------------------------------------------------------------------------
17     | 17         | 374526   | 65       | 2014-01-21 16:08:00.057 | 3
17     | 17         | 3497     | 61       | NULL                    | 3
17     | 17         | 3498     | 69       | NULL                    | 3
17     | 17         | 3499     | 70       | NULL                    | 3
17     | 17         | 3500     | 72       | NULL                    | 3
17     | 17         | 4071     | 62       | NULL                    | 3
17     | 17         | 4072     | 63       | NULL                    | 3
17     | 17         | 258050   | 64       | NULL                    | 3
17     | 43         | 4059     | 210      | NULL                    | 1
17     | 43         | 4060     | 210      | NULL                    | 1
17     | 110        | 533242   | 12       | NULL                    | 2
17     | 110        | 536466   | 12       | NULL                    | 2
17     | 110        | 577857   | 12       | 2015-09-24 09:13:15.127 | 2

I have to keep the top X answers for each Question per User, where X is MaxAnswer, ordered by LastUpdated DESC ?? AnswerID DESC, and delete the rest - unless ChoiceId comes more than once, in which case just keep one of that ChoiceId. For a given QuestionId, MaxAnswer is always the same.

I currently have the above select (note: in the above data sample I had AnswerId ASC, it's been corrected) but I'm not sure how I'd go (I assume using partition?) from there.

EDIT: Expected output for this sample would be:

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated             | MaxAnswers
--------------------------------------------------------------------------------
17     | 17         | 374526   | 65       | 2014-01-21 16:08:00.057 | 3
17     | 17         | 258050   | 64       | NULL                    | 3
17     | 17         | 4072     | 63       | NULL                    | 3
17     | 43         | 4060     | 210      | NULL                    | 1
17     | 110        | 577857   | 12       | 2015-09-24 09:13:15.127 | 2

Upvotes: 0

Views: 55

Answers (1)

Eralper
Eralper

Reputation: 6612

Please try following code

;with cte as (
    select
        *,
        rn = row_number() over (partition by UserId, QuestionId order by LastUpdated desc, AnswerId desc)
    from UserAnswers
)
delete UserAnswers
from UserAnswers u
inner join cte 
    on  u.UserId = cte.UserId and
        u.QuestionId = cte.QuestionId and
        u.AnswerId = cte.AnswerId
where cte.rn > cte.MaxAnswers

You can also refer to following SQL tutorial where SQL Row_Number() function is used to delete duplicate rows

This is for test

create table UserAnswers (
UserId int, QuestionId int,  AnswerId int,  ChoiceId int,  LastUpdated datetime, MaxAnswers int
)
insert into UserAnswers select 17     , 17         , 374526   , 65       , '2014-01-21 16:08:00.057' ,   3
insert into UserAnswers select 17     , 17         , 3497     , 61       , NULL        , 3
insert into UserAnswers select 17     , 17         , 3498     , 69       , NULL        , 3
insert into UserAnswers select 17     , 17         , 3499     , 70       , NULL        , 3
insert into UserAnswers select 17     , 17         , 3500     , 72       , NULL        , 3
insert into UserAnswers select 17     , 17         , 4071     , 62       , NULL        , 3
insert into UserAnswers select 17     , 17         , 4072     , 63       , NULL        , 3
insert into UserAnswers select 17     , 17         , 258050   , 64       , NULL        , 3
insert into UserAnswers select 17     , 43         , 4059     , 210      , NULL        , 1
insert into UserAnswers select 17     , 43         , 4060     , 210      , NULL        , 1
insert into UserAnswers select 17     , 110        , 533242   , 12       , '2015-09-24 09:13:15.127' ,   2

Upvotes: 3

Related Questions