Reputation: 25
I have no experience with PostgreSQL and I am migrating a Rails5+MySQL application to Rails5+PostgreSQL and I am having a problem with a query.
I've already looked at some questions/answers and still haven't been able to solve my problem. My problem seems to be ridiculous, but I needed to ask for help here!
Query:
SELECT DISTINCT users.* FROM users
INNER JOIN areas_users ON areas_users.user_id = users.id
INNER JOIN areas ON areas.deleted_at IS NULL AND areas.id = areas_users.area_id
WHERE users.deleted_at IS NULL AND users.company_id = 2 AND areas.id IN (2, 4, 5)
ORDER BY CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END, users.id, 1 ASC
Running the query in DBeaver, returns the error:
SQL Error [42P10]: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
What do I need to do to be able to use this SELECT DISTINCT
with this ORDER BY CASE
?
Upvotes: 0
Views: 4920
Reputation: 656311
It's like error message says:
for SELECT DISTINCT, ORDER BY expressions must appear in select list
This is an expression:
CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END
You cannot order by it, while doing SELECT DISTINCT users.* FROM ...
because that only allows ORDER BY
expressions that appear in the SELECT
list.
Typically, the best solution for DISTINCT
is not to use it in the first place. If you don't duplicate rows, you don't have to de-duplicate them later. See:
In your case, use an EXISTS
semi-join (expression / subquery) instead of the joins. This avoids the duplication. Assuming distinct rows in table users
, DISTINCT
is out of job.
SELECT u.*
FROM users u
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND EXISTS (
SELECT FROM areas_users au JOIN areas a ON a.id = au.area_id
WHERE au.user_id = u.id
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
)
ORDER BY CASE u.id WHEN 3 THEN 0
WHEN 5 THEN 1 END, u.id, 1; -- ①
Does what you request, and typically much faster, too.
Using simple ("switched") CASE
syntax.
① There is still an ugly bit. Using a positional reference in ORDER BY
can be convenient short syntax. But while you have SELECT *
, it's a really bad idea. If the order of columns in the underlying table changes, your query is silently changed. Spell out the column in this use case!
(Typically, you don't need SELECT *
in the first place, but just a selection of columns.)
IF your ID column is guaranteed to have positive numbers, this would be a bit faster:
...
ORDER BY CASE u.id WHEN 3 THEN -2
WHEN 5 THEN -1
ELSE u.id END, <name_of_first_column>
DISTINCT
(Really?) If you insist:
SELECT DISTINCT CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END AS order_column, u.*
FROM users u
JOIN areas_users au ON au.user_id = u.id
JOIN areas a ON a.id = au.area_id
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
ORDER BY 1, <name_of_previously_first_column>; -- now, "ORDER BY 1" is ok
You get the additional column order_column
in the result. You can wrap it in a subquery with a different SELECT
...
Just a proof of concept. Don't use this.
DISTINCT ON
?SELECT DISTINCT ON (CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>)
u.*
FROM users u
JOIN areas_users au ON au.user_id = u.id
JOIN areas a ON a.id = au.area_id
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
ORDER BY CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>;
This works without returning an additional column. Still just proof of concept. Don't use it, the EXISTS
query is much cheaper.
See:
Upvotes: 1