Nathan Wienert
Nathan Wienert

Reputation: 1623

Can't get DISTINCT + GROUP to work with Postgres with two values

Trying to get the latest activities, with distinct user_id's, ordered by the "latest" (either order "id desc" or "created_at desc"):

SELECT DISTINCT ON (id, user_id) id,user_id 
FROM "activities" 
GROUP BY id,user_id 
ORDER BY id desc LIMIT 10

But that returns 10 results with the following user_ids:

2863, 2863, 2863, 2863, 2863, 2863, 2863, 2863, 2863, 2615

Which are not distinct, obviously. How would I get this so it has a distinct (grouped) user_id?

Edit: Perhaps I explained this poorly, I apologize. Really simply, I want the latest activities from unique user_id's. By latest I would prefer it by created_at, but by id works as well. And I only want one per user, no duplicates.

Edit again: Realized that first sentence really was throwing off the answerers. Sorry for throwing everyone off there, what I want is really pretty simple :/.

Upvotes: 2

Views: 1033

Answers (4)

Michael Buen
Michael Buen

Reputation: 39413

If you have some latest activities with ties, e.g.

user_id created_at      id
john    july 4, 2010    1
john    july 4, 2010    2
john    july 12, 2010   3        -- ties with id# 4
john    july 12, 2010   4        -- ties with id# 3
john    july 5, 2010    5
paul    july 13, 2010   6
paul    july 12, 2010   7

Use dense_rank:

with latest_activities as
(
    SELECT user_id, created_at, id,
        dense_rank() 
        over(partition by user_id order by created_at desc) as the_ranking
    FROM activities     
)
select * 
from latest_activities
where the_ranking = 1
order by user_id, id;

The query above will show:

user_id created_at      id 
john    july 12, 2010   3
john    july 12, 2010   4
paul    july 13, 2010   6

If you want the multiple ids per user to appear on one line only, use group_concat array_agg, it will then show this:

user_id created_at      ids
john    july 12, 2010   3, 4
paul    july 13, 2010   6

array_agg:

with latest_activities as
(
    SELECT user_id, created_at, id,
        dense_rank() 
        over(partition by user_id order by created_at desc) as the_ranking
    FROM activities     
)
select user_id, created_at, array_agg(id order by id) as ids
from latest_activities
where the_ranking = 1
group by user_id, created_at

Do note that the queries above are also working on data with no ties in them

Live test: http://www.sqlfiddle.com/#!12/c48ec/8

Upvotes: 1

Michael Buen
Michael Buen

Reputation: 39413

RE: I want the latest activities from unique user_id's

SELECT DISTINCT ON (user_id) user_id, created_at
FROM "activities" 
ORDER BY user_id, created_at desc

RE: Then how do I get the id's? I still need ids

Reuse the query above:

with latest_activities as
(
  SELECT DISTINCT ON (user_id) user_id, created_at
  FROM "activities" 
  ORDER BY user_id, created_at desc
)
select l.user_id, l.created_at, array_agg(a.id) as ids
from latest_activities l
join activities a using(user_id)
group by l.user_id, l.created_at

Upvotes: 1

Bohemian
Bohemian

Reputation: 425168

Try this:

select user_id, array_agg(id) as ids
from activities
group by user_id
having count(*) > 1;

The array_agg() function returns all values for the group as a comma-delimited list.

Edited:

It seems from the comment that you might just want this:

select distinct user_id
from activities;

Or possibly this, which only displays those user ids that have duplicates:

select user_id
from activities
group by user_id
having count(*) > 1;

Upvotes: 1

BellevueBob
BellevueBob

Reputation: 9618

I'm not exactly sure what you want, but try this:

SELECT id, user_id
FROM (
   SELECT id
        , user_id
        , ROW_NUMBER () OVER (PARTITION BY id
                              ORDER BY user_id DESC) as rn
   FROM "activities"
   ) as xx
WHERE rn = 1
ORDER BY id desc 
LIMIT 10

The ROW_NUMBER OLAP function will assign a counter within values of id.

Upvotes: 1

Related Questions