Reputation: 3295
It's the biggest surprise on the part of Postgres I've ever had to deal with. I made this simple test on ubuntu 14.04:
postgres=# select version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.0 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55), 64-bit
(1 row)
postgres=# create table test (val text);
CREATE TABLE
postgres=# insert into test values ('a'), ('b'), ('-a'), ('-b');
INSERT 0 4
postgres=# select * from test order by val;
val
-----
a
-a
b
-b
(4 rows)
Obviously this is not what can be considered correct. A similar result can be seen on sqlfiddle that uses
PostgreSQL version 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
The same test on Windows servers
PostgreSQL 9.4.0, compiled by Visual C++ build 1800, 64-bit
PostgreSQL 9.1.2, compiled by Visual C++ build 1500, 32-bit
gives the correct result:
postgres=# select version();
version
-------------------------------------------------------------
PostgreSQL 9.4.0, compiled by Visual C++ build 1800, 64-bit
(1 row)
postgres=# create table test (val text);
CREATE TABLE
postgres=# insert into test values ('a'), ('b'), ('-a'), ('-b');
INSERT 0 4
postgres=# select * from test order by val;
val
-----
-a
-b
a
b
(4 rows)
I do not know how to deal with it. I need a stable server on Ubuntu, but now lost confidence in this version. Ubuntu users, please make the test and share the results. Any tips would be appreciated of course.
Thanks to Simo I found out what follows.
Postgres has only two built-in collations: C and POSIX. Any other collations are provided by operating system.
Postgres 9.3 server on sqlfiddle uses lc_collate = en_US.utf8
and my server uses pl_PL.utf8
, what means the collations are provided by Ubuntu. To check it out I ran sort on terminal and obtained the same results:
me@ubu:~$ sort
a
b
-a
-b
a
-a
b
-b
Now I know the culprit, but it still does not solve my problem.
Upvotes: 1
Views: 139
Reputation: 61526
my server uses pl_PL.utf8, what means the collations are provided by Ubuntu
Yes, but this can be overriden in various ways.
An individual expression can be assigned a collation, which is particularly relevant in an ORDER BY
clause. If you use:
select * from test order by val COLLATE "C";
This will sort as what is to you the "correct result", independently of lc_collate
.
An individual column can also be assigned a collation so that it doesn't have to be explicitly mentioned in every sort or comparison:
create table test (val text COLLATE "C");
A database can also be assigned a collation that differs from the server's default, see LC_COLLATE
clause in CREATE DATABASE.
See Collation support in PostgreSQL manual for more.
Upvotes: 2
Reputation: 121604
Unfortunately you will probably have to accept this. In short, on GNU systems in UTF locales all non alphanumeric characters are ignored during sorting. You can find more detailed explanation in this answer.
Upvotes: 2