Reputation: 6170
I have the table below with user_id, timestamp and event_id. The column "tag" denotes if this is the desirable row (tag = 1) or not (tag = 0):
user_id | timestamp | event_id | tag
46 | 2018-12-21 08:42:35.000 | 1 | 0
46 | 2018-12-21 09:58:35.000 | 2 | 1
46 | 2018-12-22 06:42:35.000 | 3 | 0
46 | 2018-12-22 07:18:35.000 | 4 | 1
46 | 2018-12-22 08:30:35.000 | 5 | 1
46 | 2018-12-23 06:42:35.000 | 6 | 0
46 | 2018-12-23 06:11:35.000 | 7 | 1
46 | 2018-12-23 07:51:35.000 | 8 | 1
46 | 2018-12-23 07:26:35.000 | 9 | 1
46 | 2018-12-23 07:37:35.000 | 10 | 1
46 | 2018-12-23 08:05:35.000 | 11 | 1
46 | 2018-12-23 08:20:35.000 | 12 | 1
46 | 2018-12-23 09:10:35.000 | 13 | 1
46 | 2018-12-23 09:42:35.000 | 14 | 0
46 | 2018-12-23 10:17:35.000 | 15 | 1
46 | 2018-12-24 09:42:35.000 | 16 | 0
46 | 2018-12-24 10:45:35.000 | 17 | 0
46 | 2018-12-24 11:12:35.000 | 18 | 0
46 | 2018-12-24 11:51:35.000 | 19 | 1
122 | 2018-12-22 08:30:35.000 | 1 | 1
122 | 2018-12-23 06:42:35.000 | 2 | 0
122 | 2018-12-23 06:11:35.000 | 3 | 1
122 | 2018-12-23 07:51:35.000 | 4 | 1
122 | 2018-12-23 07:26:35.000 | 5 | 1
122 | 2018-12-23 07:37:35.000 | 6 | 1
122 | 2018-12-28 06:42:35.000 | 1 | 0
122 | 2018-12-28 06:38:35.000 | 2 | 1
122 | 2018-12-28 07:51:35.000 | 3 | 1
122 | 2018-12-28 07:26:35.000 | 4 | 1
122 | 2018-12-28 08:42:35.000 | 5 | 0
122 | 2018-12-28 09:38:35.000 | 6 | 0
122 | 2018-12-28 10:51:35.000 | 7 | 0
122 | 2018-12-28 11:26:35.000 | 8 | 0
So I would like to find:
Ideally the returned table should look like this:
user_id | first_occurrence |event_id | consecutive_events
46 | 2018-12-23 06:11:35.000 | 7 | 2 <-- 2 consecutive triplets
46 | 2018-12-23 07:37:35.000 | 10 | 2 <-- this has 4 consecutive events but I am only interested in triplets of events.
122| 2018-12-23 06:11:35.000 | 4 | 1
122| 2018-12-28 06:38:35.000 | 2 | 1
In other words the columns consecutive_events must display all the triplets for a user in each day while the columns first_occurrence & event_id should display the fist timestamp & event_id of each triplet per user_id & date.
NOTE: The user_id 46 has a triplet of 0 (tag = 0). These triplet should be excluded.
46 | 2018-12-24 09:42:35.000 | 16 | 0
46 | 2018-12-24 10:45:35.000 | 17 | 0
46 | 2018-12-24 11:12:35.000 | 18 | 0
I tried to use the DENSE_RANK()
function but the results are far from optimal:
dense_rank() over (partition by user_id, date(timestamp) order by tag,date(timestamp) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
[UPDATE]
The example I'm referring to in the first comment of Gordon's answer is the following. For these consecutive events:
user_id | timestamp | event_id | tag
46 | 2018-12-23 06:11:35.000 | 7 | 1
46 | 2018-12-23 07:51:35.000 | 8 | 1
46 | 2018-12-23 07:26:35.000 | 9 | 1
46 | 2018-12-23 07:37:35.000 | 10 | 1
46 | 2018-12-23 08:05:35.000 | 11 | 1
46 | 2018-12-23 08:20:35.000 | 12 | 1
46 | 2018-12-23 09:10:35.000 | 13 | 1
the query returns:
user_id | min(timestamp) | min_event_id | num_consecutive
46 | 2018-12-23 06:11:35.000 | 7 | 2
It should also return
user_id | min(timestamp) | min_event_id | num_consecutive
46 | 2018-12-23 06:11:35.000 | 7 | 2
46 | 2018-12-23 07:37:35.000 | 10 | 2
Do you think this is possible to fetch as well?
Upvotes: 0
Views: 559
Reputation: 1
The number of times the user had 3 (tag = 1 or tag = 0) consecutive events (i.e. triplets) in the same date.
The timestamp of the 1st event for each of those 3 consecutive events.
Ideally, the returned table should look like this:
with raw_data as (
select
user_id,
timestamp,
timestamp::date as date,
event_id,
tag,
lag(tag, -1) over(partition by user_id, timestamp::date order by timestamp) as next_tag,
lag(tag, -2) over(partition by user_id, timestamp::date order by timestamp) as next_tag_1
from raw_data.test
)
select
user_id,
date,
min(timestamp) as first_event,
count(distinct event_id) as number_of_times
from raw_data
where (tag = 1
and next_tag = 1
and next_tag_1 = 1)
group by 1,2
Upvotes: 0
Reputation: 1269445
This is a gaps and islands problem. The difference of row numbers seems like the best approach:
To get all the adjacent values:
select user_id, min(timestamp) as timestamp,
count(*) as num_consecutive,
min(event_id) as min_event_id
from (select t.*,
row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
from t
) t
group by user_id, timestamp::date, tag, (seqnum - seqnum_t);
I you want each sequence individually, just add where tag = 1
and having count(*) >= 3
this query.
To convert this to your desired result set, use a subquery:
select user_id, min(event_id), min(timestamp),
(sum(num_consecutive) / 3)
from (select user_id, min(timestamp) as timestamp,
count(*) as num_consecutive,
min(event_id) as min_event_id
from (select t.*,
row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
from t
) t
where tag = 1
group by user_id, timestamp::date, tag, (seqnum - seqnum_t)
) t
where num_consecutive >= 3
group by user_id, timestamp::date;
Upvotes: 1