Performance with NOT EXISTS - t-sql query

Question

This (modified for simplicity) query is part of a larger query, and joined on date with other selects. However I have pinned this section to be dog slow. Say I have a UserLoginHistory-table that logs each login for a user. For each user I want the date they first logged in. (Later on in the query, I group by LogDate to get how many first-time logins there were each day.)

select
    LogDate, --(this value is only date, no time)
    UserId
from
    UserLoginHistory ul
where
    not exists
        (
            select 
                * 
            from 
                UserLoginHistory ulPrevious
            where
                ulPrevious.LogDate < ul.LogDate
                and ul.UserId = ulPrevious.UserId
        )
group by ul.LogDate, ul.UserId

Obviously the NOT EXISTS-part is the slow one. But I can't figure out how to replace it by something more efficient doing the same job.

With a small UserLogHistory-count, performance is no problem. It's when I get to around 15 000 it starts getting slow. Maybe I should batch the result for each day into another table, but I'd like to find a better solution to this query as there should be one out there...

Thanks for your time!

user1919238 · Accepted Answer

You can use a row numbering method:

select LogDate,UserId from (
    select
       LogDate, 
       UserId
       row_number() over (partition by UserId order by LogDate) as rown
    from
        UserLoginHistory ul
)
where rown = 1

Rows for each ID are numbered by LogDate, so the earliest one will always be numbered 1.

Note: I don't think the group by in your original query was necessary--the not exists clause should guarantee that you only get unique combinations of UserId and LogDate.

Performance with NOT EXISTS - t-sql query

Answers (2)

Related Questions