Reputation: 336
I have to clean a database that ended with duplicates due to incorrect application code.
To get necessary data, I am joining tables containing quiz users, questions and answers. This gives me:
UserId | QuestionId | AnswerId | ChoiceId | LastUpdated | MaxAnswers
--------------------------------------------------------------------------------
17 | 17 | 374526 | 65 | 2014-01-21 16:08:00.057 | 3
17 | 17 | 3497 | 61 | NULL | 3
17 | 17 | 3498 | 69 | NULL | 3
17 | 17 | 3499 | 70 | NULL | 3
17 | 17 | 3500 | 72 | NULL | 3
17 | 17 | 4071 | 62 | NULL | 3
17 | 17 | 4072 | 63 | NULL | 3
17 | 17 | 258050 | 64 | NULL | 3
17 | 43 | 4059 | 210 | NULL | 1
17 | 43 | 4060 | 210 | NULL | 1
17 | 110 | 533242 | 12 | NULL | 2
17 | 110 | 536466 | 12 | NULL | 2
17 | 110 | 577857 | 12 | 2015-09-24 09:13:15.127 | 2
I have to keep the top X answers for each Question
per User
, where X
is MaxAnswer
, ordered by LastUpdated DESC
?? AnswerID DESC
, and delete the rest - unless ChoiceId
comes more than once, in which case just keep one of that ChoiceId
.
For a given QuestionId
, MaxAnswer
is always the same.
I currently have the above select (note: in the above data sample I had AnswerId ASC, it's been corrected) but I'm not sure how I'd go (I assume using partition
?) from there.
EDIT: Expected output for this sample would be:
UserId | QuestionId | AnswerId | ChoiceId | LastUpdated | MaxAnswers
--------------------------------------------------------------------------------
17 | 17 | 374526 | 65 | 2014-01-21 16:08:00.057 | 3
17 | 17 | 258050 | 64 | NULL | 3
17 | 17 | 4072 | 63 | NULL | 3
17 | 43 | 4060 | 210 | NULL | 1
17 | 110 | 577857 | 12 | 2015-09-24 09:13:15.127 | 2
Upvotes: 0
Views: 55
Reputation: 6612
Please try following code
;with cte as (
select
*,
rn = row_number() over (partition by UserId, QuestionId order by LastUpdated desc, AnswerId desc)
from UserAnswers
)
delete UserAnswers
from UserAnswers u
inner join cte
on u.UserId = cte.UserId and
u.QuestionId = cte.QuestionId and
u.AnswerId = cte.AnswerId
where cte.rn > cte.MaxAnswers
You can also refer to following SQL tutorial where SQL Row_Number() function is used to delete duplicate rows
This is for test
create table UserAnswers (
UserId int, QuestionId int, AnswerId int, ChoiceId int, LastUpdated datetime, MaxAnswers int
)
insert into UserAnswers select 17 , 17 , 374526 , 65 , '2014-01-21 16:08:00.057' , 3
insert into UserAnswers select 17 , 17 , 3497 , 61 , NULL , 3
insert into UserAnswers select 17 , 17 , 3498 , 69 , NULL , 3
insert into UserAnswers select 17 , 17 , 3499 , 70 , NULL , 3
insert into UserAnswers select 17 , 17 , 3500 , 72 , NULL , 3
insert into UserAnswers select 17 , 17 , 4071 , 62 , NULL , 3
insert into UserAnswers select 17 , 17 , 4072 , 63 , NULL , 3
insert into UserAnswers select 17 , 17 , 258050 , 64 , NULL , 3
insert into UserAnswers select 17 , 43 , 4059 , 210 , NULL , 1
insert into UserAnswers select 17 , 43 , 4060 , 210 , NULL , 1
insert into UserAnswers select 17 , 110 , 533242 , 12 , '2015-09-24 09:13:15.127' , 2
Upvotes: 3