Reputation: 507
I'm working with a database that have a poor design that does not constraint duplicates rows as long as they have a different unique-identifier.
Within one of the table, a given user can have an attribute and a value for the attribute. Normally, a user would only a have a single time the attribute but because of the poor design, I'm getting a lot of duplicates in the table and now I need to clean that mess. This is due to the CRM software not always checking if the row exists when we modify the employee profile but instead it creates a bunch of new rows with duplicates values.
The following query returns the duplicates values:
SELECT ua.ID AS LineID
,ua.Modified AS LineLastModifiedDate
,u.FullName AS EmployeeName
,a.Name AS AttributeName
,ua.value AS AttributeValue
FROM UserAttributes AS ua
INNER JOIN Users AS u ON ua.userid = u.id
INNER JOIN Attributes AS a ON ua.AttributeID = a.ID
WHERE EXISTS (
SELECT NULL
FROM UserAttributes as ua2
WHERE ua2.UserID = ua.UserID
AND ua2.AttributeID = ua.AttributeID
AND ua2.ID != ua.ID
)
And produces results as this:
LineID LineLastModifiedDate EmployeeName AttributeName AttributeValue
------ ----------------------- ------------- --------------- ---------------
15 2016-01-01 Employee1 EmployeeNumber 15
19 2016-07-20 Employee1 EmployeeNumber 15
35 2016-01-01 Employee2 EmployeeSex M
96 2016-07-20 Employee2 EmployeeSex M
21 2016-03-03 Employee1 SickDays 3
99 2016-07-10 Employee1 SickDays 5
What I need to accomplish starting from this query is : ForEach grouping of the same EmployeeName and AttributeName, give me the last modified line expecting results like this :
LineID LineLastModifiedDate EmployeeName AttributeName AttributeValue
------ ----------------------- ------------- --------------- ---------------
19 2016-07-20 Employee1 EmployeeNumber 15
96 2016-07-20 Employee2 EmployeeSex M
99 2016-07-10 Employee1 SickDays 5
How can I modify my query to accomplish this ?
Thank you
-M
Upvotes: 0
Views: 60
Reputation: 17925
You can use row numbering or a scheme as below where you pull out the max value and then use a join. Presumably you can't have ties by date.
select ...
from
UserAttributes as ua
inner join
(
select
UserID, AttributeID,
max(LineLastModifiedDate) as MaxLineLastModifiedDate
fromUserAttributes
group by UserId
) as max_ua
on max_ua.UserID = ua.UserID
and max_ua.AttributeID = max_ua.AttributeID
and max_ua.MaxLineLastModifiedDate = ua.LineLastModifiedDate
...
Upvotes: 0
Reputation: 28930
;WITH CTE
AS
(
SELECT ua.ID AS LineID
,ua.Modified AS LineLastModifiedDate
,u.FullName AS EmployeeName
,a.Name AS AttributeName
,ua.value AS AttributeValue
,ROW_NUMBER() OVER (PARTITION BY EMPLOYEENAME,EMPLOYEESEX ORDER BY UA.Modified DESC) AS RN
FROM UserAttributes AS ua
INNER JOIN Users AS u ON ua.userid = u.id
INNER JOIN Attributes AS a ON ua.AttributeID = a.ID
WHERE EXISTS (
SELECT NULL
FROM UserAttributes as ua2
WHERE ua2.UserID = ua.UserID
AND ua2.AttributeID = ua.AttributeID
AND ua2.ID != ua.ID
)
)
SELECT * FROM cte where rn=1
Upvotes: 2