Reputation: 466
I'm trying to print out a list of all activity for a random group of 500 users who starting using my app after Jan 1st.
with random_users as (select distinct id, min(timestamp) as first_event
from log
group by id
having first_event >= '2019-01-01'
order by random()
limit 500)
select random_users.id, log.timestamp, log.event
from random_users left join log on log.id = random_users.id
Getting a random selection of users is easily done using PostgreSQL's random()
, but when I try to combine this with the condition of having first_event >= '2019-01-01'
I'm getting some problems. Namely, timestamp
is actually showing as prior to 2019-01-01
for many users in the final results, something like this:
id timestamp event
5 2018-11-12 click
2 2018-12-27 purchase
7 2019-01-03 click
I am wondering if this is something to do with how the random()
function works, as similar queries without this give expected results How can I successfully limit the random()
function to groups of users who've used the app after 2019-01-01
?
Upvotes: 1
Views: 328
Reputation: 2208
Re-thinking this now that I fully understand what you're after. PostgreSQL has DISTINCT ON which you can use to select the first row matching certain conditions:
with user_first_events as (SELECT DISTINCT ON (id) id, timestamp, event
FROM log
WHERE timestamp >= '2019-01-01'
ORDER BY id, timestamp ASC)
SELECT * FROM user_first_events ORDER BY random() LIMIT 500
Upvotes: 1