Arnaud Perrier
Arnaud Perrier

Reputation: 81

Postgresql 12 - Collation not working Windows / Linux

I've installed Postgresql 12 on both Wnindows and Linux CentOS 8.

For my project, I needed to create my own ICU Collation and apply it to all character columns (either in column creation or order by requests = preferred).

Before doing so, I tried to make a simple test case to check if it's works as expected ... and it's not.

A simple table with some numeric and alphabetic data

DROP TABLE IF EXISTS TBL;
CREATE TABLE TBL ( TEXT1 CHARACTER(5), TEXT2 CHARACTER(5) );
INSERT INTO TBL VALUES
 ('aaaaa', 'aaaaa')
,('aaaaa', '00000')
,('aaaaa', 'bbbbb')
,('aaaaa', '11111')

,('bbbbb', '22222')
,('00000', '22222')
,('ccccc', '22222')
,('11111', '22222');

The collation to order digits after latin characters from the official documentation https://www.postgresql.org/docs/12/collation.html

CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit');
CREATE COLLATION digitslast (provider = icu, locale = 'en@colReorder=latn-digit');
    Sort digits after Latin letters. (The default is digits before letters.)

The testing requests

1/ SELECT * FROM TBL;
2/ SELECT * FROM TBL ORDER BY TEXT1, TEXT2;
3/ SELECT * FROM TBL ORDER BY TEXT1 COLLATE digitslast;
4/ SELECT * FROM TBL ORDER BY TEXT2 COLLATE digitslast;
5/ SELECT * FROM TBL ORDER BY TEXT1 COLLATE digitslast, TEXT2 COLLATE digitslast;
6/ SELECT * FROM TBL ORDER BY TEXT2 COLLATE digitslast, TEXT1 COLLATE digitslast;

The results on Windows = collation not works at all

1/ OK = TEXT1 + TEXT2 = digits before letters
2/ OK = TEXT1 + TEXT2 = digits before letters
3/ KO = TEXT1 + TEXT2 = digits before letters
4/ KO = TEXT1 + TEXT2 = digits before letters
5/ KO = TEXT1 + TEXT2 = digits before letters
6/ KO = TEXT1 + TEXT2 = digits before letters

The results on Linux Centos 8 = collation works only for 1st column from ORDER BY

1/ OK = TEXT1 + TEXT2 = digits before letters
2/ OK = TEXT1 + TEXT2 = digits before letters
3/ OK = TEXT1 = digits after letters + TEXT2 = digits before letters
4/ OK = TEXT1 = digits before letters + TEXT2 = digits after letters
5/ KO = TEXT1 = digits after letters + TEXT2 = digits before letters
6/ KO = TEXT1 = digits before letters + TEXT2 = digits after letters

If I applied the COLLATE clause on the table columns, it's not working also.

ICU 53 is packaged to all Postgresql 12 installation and --with-icu argument is present on Postgresql configuration.

Does anybody has any clue to make this simple test case works ?

There is an extension proprosal for ICU from https://postgresql.verite.pro/blog/2018/07/25/icu-extension.html.

Another linked issue has no response for Postgresql 10 : How can I use Postgres collation in Windows OS?

Related posts from Postgresql blog

Upvotes: 5

Views: 906

Answers (1)

Arnaud Perrier
Arnaud Perrier

Reputation: 81

I posted my issue to postgresql bugs (#16570) and got some answers from postgresql maintainers.

https://www.postgresql.org/message-id/16570-58cc04e1a6ef3c3f%40postgresql.org

Postgresql Windows installer from EnterpriseDb come with ICU version 53. Postgresql Linux Centos 8 installer come with ICU version 60.

Before ICU version 54, the parsing of collation attributes are done by Postgres. The parsing method does not implement the colReorder attribute (Posgtesql documentation must be updated).

The Postgreql documentation precise that "The examples using the k* subtags require at least ICU version 54."

CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit'); ===> for ICU >= 54

CREATE COLLATION digitslast (provider = icu, locale = 'en@colReorder=latn-digit'); ===> for ICU < 54 (colReorder is not implemented : no way to make this works)

A new issue has been created to update ICU version on Windows installer (due to technical reasons ; like reindex, version compatibility, etc... ; it will be difficult to make it available for Postgresql release 13). https://www.postgresql.org/message-id/[email protected]

On Linux Centos 8 with ICU version 60, there is still an issue when applying the collation on multiple arguments under ORDER BY clause (only the collation on the first argument is taken into account).

Upvotes: 3

Related Questions