Reputation: 23
I have a table named visiting that looks like this:
id | visitor_id | visit_time
-------------------------------------
1 | 1 | 2009-01-06 08:45:02
2 | 1 | 2009-01-06 08:58:11
3 | 1 | 2009-01-06 09:08:23
4 | 1 | 2009-01-06 21:55:23
5 | 1 | 2009-01-06 22:03:35
I want to work out a sql that can get how many times a user visits within one session(successive visit's interval less than 1 hour).
So, for the example data, I want to get following result:
visitor_id | count
-------------------
1 | 3
1 | 2
BTW, I use postgresql 8.3. Thanks!
UPDATE: updated the timestamps in the example data table. sorry for the confusion.
UPDATE: I don't care much if the solution is a single sql query, using store procedure, subquery etc. I only care how to get it done :)
Upvotes: 2
Views: 724
Reputation: 39393
PostgreSQL 8.4 will have a windowing function, by then we can eliminate creating temporary table just to simulate rownumbers (sequence purposes)
create table visit
(
visitor_id int not null,
visit_time timestamp not null
);
insert into visit(visitor_id, visit_time)
values
(1, '2009-01-06 08:45:02'),
(2, '2009-02-06 08:58:11'),
(1, '2009-01-06 08:58:11'),
(1, '2009-01-06 09:08:23'),
(1, '2009-01-06 21:55:23'),
(2, '2009-02-06 08:59:11'),
(2, '2009-02-07 00:01:00'),
(1, '2009-01-06 22:03:35');
create temp table temp_visit(visitor_id int not null, sequence serial not null, visit_time timestamp not null);
insert into temp_visit(visitor_id, visit_time) select visitor_id, visit_time from visit order by visitor_id, visit_time;
select
reference.visitor_id, count(nullif(reference.visit_time - prev.visit_time < interval '1 hour',false))
from temp_visit reference
left join temp_visit prev
on prev.visitor_id = reference.visitor_id and prev.sequence = reference.sequence - 1
group by reference.visitor_id;
Upvotes: 1
Reputation: 13761
There is no way to do this in a single SQL statment.
Below are 2 ideas: one uses a loop to count visits, the other changes the way the visiting
table is populated.
However, it can be done without too much trouble with a loop.
(I have tried to get the postgresql syntax correct, but I'm no expert)
/* find entries where there is no previous entry for */
/* the same visitor within the previous hour: */
select v1.* , 0 visits
into temp_table
from visiting v1
where not exists ( select 1
from visiting v2
where v2.visitor_id = v1.visitor_id
and v2.visit_time < v1.visit_time
and v1.visit_time - interval '1 hour' < v2.visit_time
)
select @rows = @@rowcount
while @rows > 0
begin
update temp_table
set visits = visits + 1 ,
last_time = v.visit_time
from temp_table t ,
visiting v
where t.visitor_id = v.visitor_id
and v.visit_time - interval '1 hour' < t.last_time
and not exists ( select 1
from visiting v2
where v2.visitor_id = t.visitor_id
and v2.visit_time between t.last_time and v.visit_time
)
select @rows = @@rowcount
end
/* get the result: */
select visitor_id,
visits
from temp_table
The idea here is to do this:
I suggest:
visiting
table: session_id int not null
session_id
to the same as the session id
for that earlier visit. If not, it generates a new session_id
.Then your original query can be solved by:
SELECT session_id, visitor_id, count(*)
FROM visiting
GROUP BY session_id, visitor_id
Hope this helps. If I've made mistakes (I'm sure I have), leave a comment and I'll correct it.
Upvotes: 1
Reputation: 75794
The question is slightly ambiguous because you're making the assumption or requiring that the hours are going to start at a set point, i.e. a natural query would also indicate that there's a result record of (1,2) for all the visits between the hour of 08:58 and 09:58. You would have to "tell" your query that the start times are for some determinable reason visits 1 and 4, or you'd get the natural result set:
visitor_id | count
--------------------
1 | 3
1 | 2 <- extra result starting at visit 2
1 | 1 <- extra result starting at visit 3
1 | 2
1 | 1 <- extra result starting at visit 5
That extra logic is going to be expensive and too complicated for my fragile mind this morning, somebody better than me at postgres can probably solve this.
I would normally want to solve this by having a sessionkey column in the table I could cheaply group by for perforamnce reasons, but there's also a logical problem I think. Deriving session info from timings seems dangerous to me because I don't believe that the user will be definitely logged out after an hours activity. Most session systems work by expiring the session after a period of inactivity, i.e. it's very likely that a visit after 9:45 is going to be in the same session because your hourly period is going to be reset at 9:08.
Upvotes: 1
Reputation: 10552
The problem seems a little fuzzy.
It gets more complicated as id 3 is within an hour of id 1 and 2, but if the user had visited at 9:50 then that would have been within an hour of 2 but not 1.
You seem to be after a smoothed total - for a given visit, how many visits are within the following hour?
Perhaps you should be asking for how many visits have a succeeding visit less than an hour distant? If a visit is less than an hour from the preceeding one then should it 'count'?
So what you probably want is how many chains do you have where the links are less than an arbitrary amount (so the hypothetical 9:50 visit would be included in the chain that starts with id 1).
Upvotes: 1
Reputation: 19591
If it were T-SQL, I would write something as:
SELECT visitor_id, COUNT(id),
DATEPART(yy, visit_time), DATEPART(m, visit_time),
DATEPART(d, visit_time), DATEPART(hh, visit_time)
FROM visiting
GROUP BY
visitor_id,
DATEPART(yy, visit_time), DATEPART(m, visit_time),
DATEPART(d, visit_time), DATEPART(hh, visit_time)
which gives me:
1 3 2009 1 6 8
1 2 2009 1 6 21
I do not know how or if you can write this in postgre though.
Upvotes: 0
Reputation: 4066
This can't be done in a single SQL. The better option is to handle it in stored procedure
Upvotes: 0
Reputation: 4578
One or both of these may work? However, both will end up giving you more columns in the result than you are asking for.
SELECT visitor_id,
date_part('year', visit_time),
date_part('month', visit_time),
date_part('day', visit_time),
date_part('hour', visit_time),
COUNT(*)
FROM visiting
GROUP BY 1, 2, 3, 4, 5;
SELECT visitor_id,
EXTRACT(EPOCH FROM visit_time)-(EXTRACT(EPOCH FROM visit_time) % 3600),
COUNT(*)
FROM visiting
GROUP BY 1, 2;
Upvotes: 0