Marcus
Marcus

Reputation: 25

SELECT DISTINCT + ORDER BY additional expression

I have no experience with PostgreSQL and I am migrating a Rails5+MySQL application to Rails5+PostgreSQL and I am having a problem with a query.

I've already looked at some questions/answers and still haven't been able to solve my problem. My problem seems to be ridiculous, but I needed to ask for help here!

Query:

SELECT DISTINCT users.* FROM users 
INNER JOIN areas_users ON areas_users.user_id = users.id 
INNER JOIN areas ON areas.deleted_at IS NULL AND areas.id = areas_users.area_id 
WHERE users.deleted_at IS NULL AND users.company_id = 2 AND areas.id IN (2, 4, 5) 
ORDER BY CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END, users.id, 1 ASC

Running the query in DBeaver, returns the error:

SQL Error [42P10]: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list

What do I need to do to be able to use this SELECT DISTINCT with this ORDER BY CASE?

Upvotes: 0

Views: 4920

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656311

It's like error message says:

for SELECT DISTINCT, ORDER BY expressions must appear in select list

This is an expression:

CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END

You cannot order by it, while doing SELECT DISTINCT users.* FROM ... because that only allows ORDER BY expressions that appear in the SELECT list.

Typically, the best solution for DISTINCT is not to use it in the first place. If you don't duplicate rows, you don't have to de-duplicate them later. See:

In your case, use an EXISTS semi-join (expression / subquery) instead of the joins. This avoids the duplication. Assuming distinct rows in table users, DISTINCT is out of job.

SELECT u.*
FROM   users u
WHERE  u.deleted_at IS NULL
AND    u.company_id = 2
AND    EXISTS (
   SELECT FROM areas_users au JOIN areas a ON a.id = au.area_id
   WHERE  au.user_id = u.id
   AND    a.id IN (2, 4, 5)
   AND    a.deleted_at IS NULL
   )
ORDER BY CASE u.id WHEN 3 THEN 0
                   WHEN 5 THEN 1 END, u.id, 1;  -- ①

Does what you request, and typically much faster, too.

Using simple ("switched") CASE syntax.

① There is still an ugly bit. Using a positional reference in ORDER BY can be convenient short syntax. But while you have SELECT *, it's a really bad idea. If the order of columns in the underlying table changes, your query is silently changed. Spell out the column in this use case!

(Typically, you don't need SELECT * in the first place, but just a selection of columns.)

IF your ID column is guaranteed to have positive numbers, this would be a bit faster:

...
ORDER BY CASE u.id WHEN 3 THEN -2
                   WHEN 5 THEN -1
                   ELSE u.id END, <name_of_first_column>

I MUST use DISTINCT

(Really?) If you insist:

SELECT DISTINCT CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END AS order_column, u.*
FROM   users u
JOIN   areas_users au ON au.user_id = u.id
JOIN   areas a ON a.id = au.area_id
WHERE  u.deleted_at IS NULL
AND    u.company_id = 2
AND    a.id IN (2, 4, 5)
AND    a.deleted_at IS NULL
ORDER  BY 1, <name_of_previously_first_column>;  -- now, "ORDER BY 1" is ok

You get the additional column order_column in the result. You can wrap it in a subquery with a different SELECT ...

Just a proof of concept. Don't use this.

Or DISTINCT ON?

SELECT DISTINCT ON (CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>)
       u.*
FROM   users u
JOIN   areas_users au ON au.user_id = u.id
JOIN   areas a ON a.id = au.area_id
WHERE  u.deleted_at IS NULL
AND    u.company_id = 2
AND    a.id IN (2, 4, 5)
AND    a.deleted_at IS NULL
ORDER  BY CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>;

This works without returning an additional column. Still just proof of concept. Don't use it, the EXISTS query is much cheaper.

See:

Upvotes: 1

Related Questions