FPL
FPL

Reputation: 466

PostgreSQL: why does random() not work in a join?

I'm trying to print out a list of all activity for a random group of 500 users who starting using my app after Jan 1st.

with random_users as (select distinct id, min(timestamp) as first_event
from log
group by id 
having first_event >= '2019-01-01'
order by random() 
limit 500)

select random_users.id, log.timestamp, log.event
from random_users left join log on log.id = random_users.id

Getting a random selection of users is easily done using PostgreSQL's random(), but when I try to combine this with the condition of having first_event >= '2019-01-01' I'm getting some problems. Namely, timestamp is actually showing as prior to 2019-01-01 for many users in the final results, something like this:

id    timestamp   event
5     2018-11-12  click
2     2018-12-27  purchase
7     2019-01-03  click

I am wondering if this is something to do with how the random() function works, as similar queries without this give expected results How can I successfully limit the random() function to groups of users who've used the app after 2019-01-01?

Upvotes: 1

Views: 328

Answers (1)

codenamev
codenamev

Reputation: 2208

Re-thinking this now that I fully understand what you're after. PostgreSQL has DISTINCT ON which you can use to select the first row matching certain conditions:

with user_first_events as (SELECT DISTINCT ON (id) id, timestamp, event
FROM log
WHERE timestamp >= '2019-01-01'
ORDER BY id, timestamp ASC)

SELECT * FROM user_first_events ORDER BY random() LIMIT 500

Upvotes: 1

Related Questions