Using SQL to query user_id, time of beginning and ending the session if we know only user_id, actions_made and time of visit

Question

My task was to open trial cluster Redshift and to upload there data from csv file.

In csv file was given information about id of a user, what page he has visited and time of visit up to milliseconds.

The task is to find all "sessions" where user have done the next actions in the given order: rooms.homework-showcase, rooms.view.step.content,rooms.lesson.rev.step.content. (He can did it not successively).

By "session" it is meant that time passed between actions no more than one hour.

The output should consists of users id, time of the start of session and time of the end of the session.

Sample of what is given in csv:

57529,rooms.homework-showcase,2017-03-01T00:00:07.710000
57529,rooms.view.step.content,2017-03-01T00:00:10.275000
57529,rooms.view.step.content,2017-03-01T00:00:10.436000
168671,rooms.view.step.content,2017-03-01T00:00:12.035000
168671,rooms.view.step.content,2017-03-01T00:00:50.632000
64788,rooms.view.step.content,2017-03-01T00:01:21.460000
93698,rooms.view.step.content,2017-03-01T00:02:41.963000
205265,rooms.homework-showcase,2017-03-01T00:02:45.241000
205265,rooms.test-showcase,2017-03-01T00:02:57.854000
205265,notes,2017-03-01T00:03:01.016000

What I have written so far:

select *, 
case
when timepass IS NULL then ncount
when (timepass>=3600) and lag(timepass<3600) over(order by user_id, datet) 
then ncount
else Null
end as startt,

case
when (timepass<3600) and lead(timepass IS NULL) over(order by user_id, 
datet) then ncount
when (timepass<3600) and lead(timepass>=3600) over(order by user_id, datet) 
then ncount

else Null
end as endt

from
(
select *, row_number() OVER(ORDER BY user_id, datet) as ncount, 
case 
when page_name = 'rooms.homework-showcase' then 0 
when page_name = 'rooms.view.step.content' then 1
when page_name = 'rooms.lesson.rev.step.content' then 2
else 3
end as page_order
from
(
select *, EXTRACT(EPOCH from datet) - lag(EXTRACT(EPOCH from datet)) 
over(partition by user_id order by user_id, datet) as timepass from
(
SELECT *, to_timestamp(date_time, 'YYYY-MM-DD HH:MI:SS:US') as datet from 
testtbl
order by date_time asc
) as t1

group by user_id, page_name, date_time, datet
order by user_id, datet asc 
) as t3
) as t4
;

my output result

I can figure out the beginnig of the session and the end (except I don't know how to include the last row as the end of session). One user can have several sessions.

How can I query the right sessions and take from them user_id, start and end of session? Could you give me some ideas? I don't have enough knowledge in SQL to see the way of writing the right query.

P.S. I've attached how output should look like. This shows particular user and his one session which falls under given criteria. And expected output is a last picture. And there should be listed all of "good" sessions.

The csv file is under this link https://drive.google.com/open?id=0B288Ep6mrn6fWlQ0a3doV0hjQjA

DB Fiddle https://www.db-fiddle.com/f/bz92LHiTW5RrFBUkaH2KtR/0#&togetherjs=m6SnA9xLSS

Using SQL to query user_id, time of beginning and ending the session if we know only user_id, actions_made and time of visit

Answers (1)

Related Questions