sdjkhduiueeuzie
sdjkhduiueeuzie

Reputation: 13

select DISTINCT but dont mess with the order of rows

If you want to play with my data I have added a playground to the very end of the post. Thank you.

SCHEMA

CREATE TABLE "public"."posts" (
    "id" bigint NOT NULL,
    "text" "text",
    "user_id" "uuid" NOT NULL
);

INSERT INTO "public"."posts" ("id", "text", "user_id" ) VALUES (173, 'iOS created', '9361a62b-462a-4ca3-ba1d-e0c2e0174aa0');
INSERT INTO "public"."posts" ("id", "text", "user_id" ) VALUES (174, 'chrome reply', '90a42a87-9f9b-4676-bc6a-214be0050053');
INSERT INTO "public"."posts" ("id", "text", "user_id" ) VALUES (175, 'brave reply', '00bf9dae-e5a7-4f86-a3fd-0be4630cffd4');
INSERT INTO "public"."posts" ("id", "text", "user_id" ) VALUES (176, '@1 chrome mention', '9361a62b-462a-4ca3-ba1d-e0c2e0174aa0');
INSERT INTO "public"."posts" ("id", "text", "user_id" ) VALUES (177, '@0 ios mention', '00bf9dae-e5a7-4f86-a3fd-0be4630cffd4');
INSERT INTO "public"."posts" ("id", "text", "user_id" ) VALUES (178, '', '00bf9dae-e5a7-4f86-a3fd-0be4630cffd4');
INSERT INTO "public"."posts" ("id", "text", "user_id" ) VALUES (179, '@2 mention brave', '9361a62b-462a-4ca3-ba1d-e0c2e0174aa0');

QUERY

select * from
(
  select 
  --DISTINCT ON (user_id)
  text,
  row_number() over() rank,
  user_id
  from
  (
    SELECT 
    id,
    text,
    user_id
    FROM posts
    order by id asc
  ) alias1
) alias2

Does what i want, but I want to GROUP BY user_id and keep the lowest rank possible. When using DISTINCT (user_id) * the rank is taken by what feels random, that is the issue.

Uncomment --DISTINCT ON (user_id) on the playground to see rank take random rows instead of taking the first rankings rows.

Output without DISTINCT

rank    user_id text
1   9361a62b-462a-4ca3-ba1d-e0c2e0174aa0    iOS created
2   90a42a87-9f9b-4676-bc6a-214be0050053    chrome reply
3   00bf9dae-e5a7-4f86-a3fd-0be4630cffd4    brave reply
4   9361a62b-462a-4ca3-ba1d-e0c2e0174aa0    @1 chrome mention
5   00bf9dae-e5a7-4f86-a3fd-0be4630cffd4    @0 ios mention
6   00bf9dae-e5a7-4f86-a3fd-0be4630cffd4    
7   9361a62b-462a-4ca3-ba1d-e0c2e0174aa0    @2 mention brave

Output with DISTINCT

rank    user_id text
5   00bf9dae-e5a7-4f86-a3fd-0be4630cffd4    @0 ios mention
2   90a42a87-9f9b-4676-bc6a-214be0050053    chrome reply
4   9361a62b-462a-4ca3-ba1d-e0c2e0174aa0    @1 chrome mention

Output i need:

rank    user_id text
1   9361a62b-462a-4ca3-ba1d-e0c2e0174aa0    iOS created
2   90a42a87-9f9b-4676-bc6a-214be0050053    chrome reply
3   00bf9dae-e5a7-4f86-a3fd-0be4630cffd4    brave reply

Query with my data: http://sqlfiddle.com/#!15/df83a/2/0

Upvotes: 0

Views: 59

Answers (2)

Cetin Basoz
Cetin Basoz

Reputation: 23867

You can use a subquery with a group by:

select posts.* from posts
inner join
(
  select 
  user_id, min(id) as minId
  from posts
  group by user_id
) p on p.user_id = posts.user_id and p.minId = posts.Id;

EDIT: To have something that you call 'rank' but is a row_number really:

select row_number() over (order by posts.id) as rank, 
   posts.user_id, posts.text from posts
inner join
(
  select 
  user_id, min(id) as minId
  from posts
  group by user_id
) p on p.user_id = posts.user_id and p.minId = posts.Id;

Upvotes: 0

sticky bit
sticky bit

Reputation: 37527

Tables, views, query results, etc. in relational databases represent relations. Relations are a special form of (multi) sets. Sets don't have an order. So the DBMS is free to deliver a result of a query in any order it "wants" unless there's an explicit ORDER BY. That's why you get "random" results.

So use proper ORDER BY clauses for DISTINCT ON and row_number(). Use an outer query with an ORDER BY to order the end result.

SELECT rank,
       user_id,
       text
       FROM (SELECT DISTINCT ON (user_id)
                    row_number() OVER (ORDER BY id) AS rank,
                    user_id,
                    text
                    FROM posts
                    ORDER BY user_id,
                             id) AS x
       ORDER BY rank;

db<>fiddle

Upvotes: 1

Related Questions