Reputation: 345
I have a table with 4 columns: date, event_name, video_id and user_id.
I need to write a query that will return the number of users who viewed at least one video in a given day and the number of those users who returned the next day to view at least one video. The video_play event signifies that a video was played by a user.
Here's some sample data upon which I need to base my query:
date event_name video_id user_id
2018-01-01 video_play 51651561651 989189198
2018-01-01 video_play 98121651656 561884864
2018-01-01 video_play 78918918918 561884864
2018-01-01 create_video 32156541355 153215651
2018-01-01 create_video 87351531311 232135135
2018-01-02 create_video 51651561651 489846581
2018-01-02 video_play 12315315352 561884864
2018-01-02 create_video 32156541355 321351351
2018-01-02 create_video 87351531311 231513515
2018-01-02 video_play 98191891894 615616516
This is how the output should look:
date users_view_videos users_view_next_day
2018-01-01 2 1
2018-01-02 2 0
This query completes the first half of the querying task, but I cannot figure out the second half:
with t1 as
(select date,
count(distinct user_id) as users_view_next_day
from clickstream_videos
WHERE event_name = 'video_play'
group by 1)
select clickstream_videos.date,
count(distinct user_id) as users_view_videos,
lead(users_view_next_day, 1) over(order by clickstream_videos.date)
from clickstream_videos
join t1
on clickstream_videos.date = t1.date
WHERE event_name = 'video_play'
group by 1, t1.users_view_next_day
Thanks.
Upvotes: 0
Views: 90
Reputation: 147206
If you make a CTE
which lists all the distinct users who have watched a video on a given day, that can be used to give a count of users who viewed a video on a given day. You can then join it to itself on the condition that the date in the second table is the next day, and the user_id
is the same, to get the count of those who viewed videos on that day and the next one:
WITH viewers AS (
SELECT DISTINCT date, user_id
FROM clickstream_videos
WHERE event_name = 'video_play'
)
SELECT v1.date,
COUNT(v1.user_id) AS user_view_videos,
COUNT(v2.user_id) AS users_view_next_day
FROM viewers v1
LEFT JOIN viewers v2 ON v2.date = v1.date + interval '1 day'
AND v2.user_id = v1.user_id
GROUP BY v1.date
Output:
date user_view_videos users_view_next_day
2018-01-01T00:00:00.000Z 2 1
2018-01-02T00:00:00.000Z 2 0
Upvotes: 2