David Jeffery
David Jeffery

Reputation: 47

How do I remove certain duplicates in a complex SQL query

I am writing a query and need it to Remove all duplicates of a.GenUserID but also keep the most recent login date ( that is b.LogDateTime) but this date must be older than 6 months. If there are later dates, they have to be removed. I hope this makes sense.

SELECT DISTINCT 
    a.GenUserID, 
    c.DeletionDate, 
    b.LogDateTime,
    (CASE c.Disabled WHEN 0 THEN 'NO' else 'YES - ARCHIVED' end)
FROM RioReport.dbo.GenUser a 
LEFT JOIN dbo.GenUserArchive c on a.GenUserID = c.GenUserID
LEFT JOIN dbo.GenUserAccessHistory b on a.GenUserID = b.ExtraInfo
WHERE(a.Disabled=0 or c.Disabled=0)
    AND c.DeletionDate IS NOT NULL
    AND ((DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime IS NULL))
ORDER BY a.GenUserID, b.LogDateTime desc

Upvotes: 2

Views: 253

Answers (3)

trincot
trincot

Reputation: 350034

You could add the row_number() information to your query, and wrap that query into an outer query that just takes the records with number 1 from that result:

select      *
from        (
    select      a.GenUserID, 
                c.DeletionDate, 
                b.LogDateTime,
                case c.Disabled when 0 then 'NO' else 'YES - ARCHIVED' end as diabled,
                row_number() over (partition by a.GenUserID
                                   order by     b.LogDateTime desc) as rn
    from        RioReport.dbo.GenUser a 
    inner join  dbo.GenUserArchive c
            on  a.GenUserID = c.GenUserID
    left join   dbo.GenUserAccessHistory b
            on  a.GenUserID = b.ExtraInfo
    where       (a.Disabled=0 or c.Disabled=0)
    and         c.DeletionDate is not null
    and         (DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime is null)
    )
where       rn = 1            
order by    a.GenUserID

Note that you can turn the first left join into an inner join without any change to the result set, since you have a non-null check on one of its fields. inner join is then preferred, and might give a performance improvement.

If GenUserAccessHistory.LogDateTime is always non-null, then you can avoid the test or b.LogDateTime is null by moving the DateAdd(MM, -6, GetDate()) > b.LogDateTime condition to the appropriate join on clause.

The generated row number will be given in order of descending LogDateTime values, and restart from 1 for every different user.

Alternative without window functions

row_number() and other window functions are supported since SQL Server 2008. In comments you write you cannot use it. If that is the case, here is an alternative using a common table expression (supported since SQL Server 2005):

;with cte as (
    select      a.GenUserID, 
                c.DeletionDate, 
                b.LogDateTime,
                case c.Disabled when 0 then 'NO' else 'YES - ARCHIVED' end as disabled,
    from        RioReport.dbo.GenUser a 
    inner join  dbo.GenUserArchive c
            on  a.GenUserID = c.GenUserID
    left join   dbo.GenUserAccessHistory b
            on  a.GenUserID = b.ExtraInfo
    where       (a.Disabled=0 or c.Disabled=0)
    and         c.DeletionDate is not null
    and         (DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime is null)
    )    
select      *
from        cte main
where       LogDateTime is null
        or  not exists (select   1
                        from     cte sub
                        where    sub.GenUserID = main.GenUserID
                        and      sub.LogDateTime > main.LogDateTime)
order by    GenUserID

Upvotes: 2

sandeep rawat
sandeep rawat

Reputation: 4957

Use cte and window function

  ;with ctr as (
    select a.GenUserID, a.DeletionDate, a.LogDateTime 
    row_number()over(partition by a.GenUserID order by b.LogDateTime desc) rnk

    from RioReport.dbo.GenUser a )
    select a.GenUserID, a.DeletionDate, a.LogDateTime,
        CASE  WHEN DATEDIFF(mm,LogDateTime,getdate())<6 THEN 'NO' else 'YES - ARCHIVED' end)
     from ctr a where a.rnk=1

Upvotes: 2

Unnikrishnan R
Unnikrishnan R

Reputation: 5031

Try with the below query.

;WITH CTE_Group
AS(
SELECT 
    ROW_NUMBER() OVER (PARTITION BY a.GenUserID ORDER BY b.LogDateTime DESC) as RNO, 
    a.GenUserID, 
    c.DeletionDate, 
    b.LogDateTime,
    (CASE c.Disabled WHEN 0 THEN 'NO' else 'YES - ARCHIVED' end) IsArchived
FROM RioReport.dbo.GenUser a 
LEFT JOIN dbo.GenUserArchive c on a.GenUserID = c.GenUserID
LEFT JOIN dbo.GenUserAccessHistory b on a.GenUserID = b.ExtraInfo
WHERE(a.Disabled=0 or c.Disabled=0)
    AND c.DeletionDate IS NOT NULL
    AND ((DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime IS NULL)))
    SELECT  GenUserID, 
            DeletionDate, 
            LogDateTime,
            IsArchived
    FROM WITH_CTE_Group
    WHERE RNO=1

Upvotes: 2

Related Questions