Martin Lebel
Martin Lebel

Reputation: 507

T-SQL : How to obtain the last modified row from a grouping

I'm working with a database that have a poor design that does not constraint duplicates rows as long as they have a different unique-identifier.

Within one of the table, a given user can have an attribute and a value for the attribute. Normally, a user would only a have a single time the attribute but because of the poor design, I'm getting a lot of duplicates in the table and now I need to clean that mess. This is due to the CRM software not always checking if the row exists when we modify the employee profile but instead it creates a bunch of new rows with duplicates values.

The following query returns the duplicates values:

SELECT ua.ID AS LineID
    ,ua.Modified AS LineLastModifiedDate
    ,u.FullName AS EmployeeName
    ,a.Name AS AttributeName
    ,ua.value AS AttributeValue

FROM UserAttributes AS ua 
    INNER JOIN Users AS u ON ua.userid = u.id
    INNER JOIN Attributes AS a ON ua.AttributeID = a.ID

WHERE EXISTS (
    SELECT NULL 
    FROM UserAttributes as ua2 
    WHERE ua2.UserID = ua.UserID 
        AND ua2.AttributeID = ua.AttributeID 
        AND ua2.ID != ua.ID
    )

And produces results as this:

LineID LineLastModifiedDate    EmployeeName  AttributeName   AttributeValue
------ ----------------------- ------------- --------------- ---------------
15     2016-01-01              Employee1     EmployeeNumber  15             
19     2016-07-20              Employee1     EmployeeNumber  15             
35     2016-01-01              Employee2     EmployeeSex     M              
96     2016-07-20              Employee2     EmployeeSex     M              
21     2016-03-03              Employee1     SickDays        3              
99     2016-07-10              Employee1     SickDays        5            

What I need to accomplish starting from this query is : ForEach grouping of the same EmployeeName and AttributeName, give me the last modified line expecting results like this :

LineID LineLastModifiedDate    EmployeeName  AttributeName   AttributeValue
------ ----------------------- ------------- --------------- ---------------
19     2016-07-20              Employee1     EmployeeNumber  15             
96     2016-07-20              Employee2     EmployeeSex     M
99     2016-07-10              Employee1     SickDays        5                          

How can I modify my query to accomplish this ?

Thank you

-M

Upvotes: 0

Views: 60

Answers (2)

shawnt00
shawnt00

Reputation: 17925

You can use row numbering or a scheme as below where you pull out the max value and then use a join. Presumably you can't have ties by date.

select ...
from
    UserAttributes as ua
    inner join
    (
    select
        UserID, AttributeID,
        max(LineLastModifiedDate) as MaxLineLastModifiedDate
    fromUserAttributes
group by UserId
    ) as max_ua
        on      max_ua.UserID = ua.UserID
            and max_ua.AttributeID = max_ua.AttributeID
            and max_ua.MaxLineLastModifiedDate = ua.LineLastModifiedDate
    ...

Upvotes: 0

TheGameiswar
TheGameiswar

Reputation: 28930

;WITH CTE
AS
(
SELECT ua.ID AS LineID
    ,ua.Modified AS LineLastModifiedDate
    ,u.FullName AS EmployeeName
    ,a.Name AS AttributeName
    ,ua.value AS AttributeValue
    ,ROW_NUMBER() OVER (PARTITION BY EMPLOYEENAME,EMPLOYEESEX ORDER BY UA.Modified  DESC) AS RN
FROM UserAttributes AS ua 
    INNER JOIN Users AS u ON ua.userid = u.id
    INNER JOIN Attributes AS a ON ua.AttributeID = a.ID

WHERE EXISTS (
    SELECT NULL 
    FROM UserAttributes as ua2 
    WHERE ua2.UserID = ua.UserID 
        AND ua2.AttributeID = ua.AttributeID 
        AND ua2.ID != ua.ID
    )
)
SELECT * FROM cte where  rn=1

Upvotes: 2

Related Questions