Query to find all timestamps more than a certain interval apart

Question

I'm using postgres to run some analytics on user activity. I have a table of all requests(pageviews) made by every user and the timestamp of the request, and I'm trying to find the number of distinct sessions for every user. For the sake of simplicity, I'm considering every set of requests an hour or more apart from others as a distinct session. The data looks something like this:

id|          request_time|         user_id
1    2014-01-12 08:57:16.725533    1233
2    2014-01-12 08:57:20.944193    1234
3    2014-01-12 09:15:59.713456    1233
4    2014-01-12 10:58:59.713456    1234

How can I write a query to get the number of sessions per user?

Erwin Brandstetter · Accepted Answer

To start a new session after every gap >= 1 hour:

SELECT user_id, count(*) AS distinct_sessions
FROM (
   SELECT user_id
        ,(lag(request_time, 1, '-infinity') OVER (PARTITION BY user_id
                                                  ORDER BY request_time)
           <= request_time - '1h'::interval) AS step -- start new session
   FROM   tbl
   ) sub
WHERE  step
GROUP  BY user_id
ORDER  BY user_id;

Assuming request_time NOT NULL.

Explain:

In subquery sub, check for every row if a new session begins. Using the third parameter of lag() to provide the default -infinity, which is lower than any timestamp and therefore always starts a new session for the first row.
In the outer query count how many times new sessions started. Eliminate step = FALSE and count per user.

Alternative interpretation

If you really wanted to count hours where at least one request happened (I don't think you do, but another answer assumes as much), you would:

SELECT user_id
     , count(DISTINCT date_trunc('hour', request_time)) AS hours_with_req
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Query to find all timestamps more than a certain interval apart

Answers (1)

Explain:

Alternative interpretation

Related Questions