bash-
bash-

Reputation: 6304

How do databases sort Chinese characters?

I am currently writing a web app and will need to do some ordering on a set of Chinese characters and I want to know whether Chinese characters are sorted by databases, if so how does it get sorted?

For reference I will be using PostgreSQL.

Upvotes: 3

Views: 4060

Answers (2)

Peter Eisentraut
Peter Eisentraut

Reputation: 36729

PostgreSQL sorts text using the operating system locale facility. This is exactly the same behavior that operating system tools such as sort give you. So set your locale to something useful, such as zh_HK.utf8 when you initialize the database system.

If you don't like the results of that sort, you'll have to come with a custom solution.

Upvotes: 1

Thilo
Thilo

Reputation: 262534

The easiest and most common way to sort them is just as binary data, either as Unicode code points, or even more simple as raw binary data (which does work well for ASCII data). Unfortunately, that does not make for a very meaningful sort order. It does group things together though, so things like prefix queries should work.

For meaningful sort order, there is no good algorithmic solution. You'd need to work with lookup tables (see for example this thread about mapping Chinese to pinyin, by which you could then sort).

Upvotes: 0

Related Questions