Reputation: 90
Postgres order by get wrong result:
postgres=# SELECT (url) FROM posts_post ORDER BY url;
url
--------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
http://nautil.us/issue/70/variables/aging-is-a-communication-breakdown
https://github.com/felixse/FluentTerminal
http://www.bbc.com/future/story/20160408-the-ancient-peruvian-mystery-solved-from-space
http://www.graffathon.fi/2016/presentations/additive_slides.pdf
(4 rows)
As you can see, there is a problem with "http://nautil.us/issue/70/variables/aging-is-a-communication-breakdown". It doesn't sort correctly.
I saved the parsed result in Postgres using Python and psycopg2, and come to point, where I can't test sorting, cause Postgres return order by with errors.
UPD: REPRODUCE:
CREATE TABLE test_post ("id" serial NOT NULL PRIMARY KEY, "title" text NOT NULL, "url" text NOT NULL, "created" timestamp with time zone NOT NULL);
INSERT INTO test_post (title, url, created) VALUES ('Aging Is', 'http://nautil.us/issue/70/variables/aging-is-a-communication-breakdown', NOW()) ON CONFLICT DO NOTHING;
INSERT INTO test_post (title, url, created) VALUES ('Untrusted – a user', 'https://github.com/felixse/FluentTerminal', NOW()) ON CONFLICT DO NOTHING;
INSERT INTO test_post (title, url, created) VALUES ('Artyping (1939)', 'http://www.bbc.com/future/story/20160408-the-ancient-peruvian-mystery-solved-from-space', NOW()) ON CONFLICT DO NOTHING;
INSERT INTO test_post (title, url, created) VALUES (' Applying the Universal', 'http://www.graffathon.fi/2016/presentations/additive_slides.pdf', NOW()) ON CONFLICT DO NOTHING;
SELECT (url) FROM test_post ORDER BY url;
PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u 1) 6.3.0 20170516, 64-bit
Upvotes: 2
Views: 991
Reputation: 95582
Assuming that you're using UTF8 encoding, specifying the collation instead of accepting the default should fix your immediate problem. Whether this is the right thing to do is application-dependent.
There are several different ways to specify the collation. You can specify it when the database cluster is initialized, when the database is created, when you run a query, etc. See Collation support in the docs for much more detail.
CREATE TABLE test_post (
"id" serial NOT NULL PRIMARY KEY,
"title" text NOT NULL,
"url" text collate ucs_basic NOT NULL,
"created" timestamp with time zone NOT NULL
);
INSERT INTO test_post (title, url, created) VALUES
('Aging Is', 'http://nautil.us/issue/70/variables/aging-is-a-communication-breakdown', NOW()) ON CONFLICT DO NOTHING;
INSERT INTO test_post (title, url, created) VALUES
('Untrusted – a user', 'https://github.com/felixse/FluentTerminal', NOW()) ON CONFLICT DO NOTHING;
INSERT INTO test_post (title, url, created) VALUES
('Artyping (1939)', 'http://www.bbc.com/future/story/20160408-the-ancient-peruvian-mystery-solved-from-space', NOW()) ON CONFLICT DO NOTHING;
INSERT INTO test_post (title, url, created) VALUES
(' Applying the Universal', 'http://www.graffathon.fi/2016/presentations/additive_slides.pdf', NOW()) ON CONFLICT DO NOTHING;
SELECT (url) FROM test_post ORDER BY url;
http://nautil.us/issue/70/variables/aging-is-a-communication-breakdown
http://www.bbc.com/future/story/20160408-the-ancient-peruvian-mystery-solved-from-space
http://www.graffathon.fi/2016/presentations/additive_slides.pdf
https://github.com/felixse/FluentTerminal
Upvotes: 5