babis21
babis21

Reputation: 1900

Postgresql ORDER BY not working as expected

Let's try this simple example to represent the problem I'm facing.

Assume this table:

CREATE TABLE testing1
(
    id serial NOT NULL,
    word text,
    CONSTRAINT testing1_pkey PRIMARY KEY (id)
);

and that data:

insert into testing1 (word) values ('Heliod, God');
insert into testing1 (word) values ('Heliod''s Inter');
insert into testing1 (word) values ('Heliod''s Pilg');
insert into testing1 (word) values ('Heliod, Sun');

Then I want to run this query to get the results ordered by the word column:

SELECT
    id, word
FROM testing1
WHERE UPPER(word::text) LIKE UPPER('heliod%') 
ORDER BY word asc;

But look at the output, it's not ordered. I would expect the rows to be in that order, using their ids: 2, 3, 1, 4 (or, if I use the word's values: Heliod's Inter, Heliod's Pilg, Heliod, God, Heliod, Sun). This is what I get:

query 1

I thought that maybe something could confuse postgresql because of the WHERE criteria I used, but the below happens if I just order by on the rows:

query 2

Am I missing something here? I couldn't find anything in the docs about ordering values that contain quotes (I suspect that the quotes cause that behaviour because of their special meaning in postgresql, but I may be wrong).

I am using UTF-8 encoding for my database (not sure if it matters though) and this issue is happening on Postgresql version 12.7.

The output of show lc_ctype; is

"en_GB.UTF-8"

and the output of show lc_collate; is

"en_GB.UTF-8"

Upvotes: 3

Views: 2284

Answers (2)

babis21
babis21

Reputation: 1900

Indeed, I've tried @jjanes's suggestion to use the C collation and the output is the one I would expect:

SELECT
    id, word
FROM testing1
ORDER BY word collate "C" ;

enter image description here

How weird, I have been using postgresql for some years now and I never noticed that behaviour.

Relevant section from the docs:

23.2.2.1. Standard Collations

On all platforms, the collations named default, C, and POSIX are available. > Additional collations may be available depending on operating system support. The default collation selects the LC_COLLATE and LC_CTYPE values specified at database creation time. The C and POSIX collations both specify “traditional C” behavior, in which only the ASCII letters “A” through “Z” are treated as letters, and sorting is done strictly by character code byte values.

Upvotes: 0

jjanes
jjanes

Reputation: 44403

That is the correct way to order the rows in en_US.UTF-8. It does 'weird' (to someone used to ASCII) things with punctuation and whitespace, skipping on a first pass and considering it only for otherwise tied values.

If you don't want those rules, maybe use the C collation instead.

Upvotes: 5

Related Questions