Reputation: 7383
This (modified for simplicity) query is part of a larger query, and joined on date with other selects. However I have pinned this section to be dog slow. Say I have a UserLoginHistory-table that logs each login for a user. For each user I want the date they first logged in. (Later on in the query, I group by LogDate to get how many first-time logins there were each day.)
select
LogDate, --(this value is only date, no time)
UserId
from
UserLoginHistory ul
where
not exists
(
select
*
from
UserLoginHistory ulPrevious
where
ulPrevious.LogDate < ul.LogDate
and ul.UserId = ulPrevious.UserId
)
group by ul.LogDate, ul.UserId
Obviously the NOT EXISTS-part is the slow one. But I can't figure out how to replace it by something more efficient doing the same job.
With a small UserLogHistory-count, performance is no problem. It's when I get to around 15 000 it starts getting slow. Maybe I should batch the result for each day into another table, but I'd like to find a better solution to this query as there should be one out there...
Thanks for your time!
Upvotes: 3
Views: 3518
Reputation: 69789
If these are the only 2 fields you are interested in could you not use a simple aggregate?
SELECT LogDate = MIN(LogDate),
UserID
FROM UserLoginHistory
GROUP BY UserID;
Upvotes: 4
Reputation:
You can use a row numbering method:
select LogDate,UserId from (
select
LogDate,
UserId
row_number() over (partition by UserId order by LogDate) as rown
from
UserLoginHistory ul
)
where rown = 1
Rows for each ID are numbered by LogDate, so the earliest one will always be numbered 1.
Note: I don't think the group by
in your original query was necessary--the not exists
clause should guarantee that you only get unique combinations of UserId and LogDate.
Upvotes: 4