Dr.Kameleon
Dr.Kameleon

Reputation: 22820

Sort non-latin strings in Ruby

OK, so what I need is rather self-explanatory.

The same way .sort is used, in order to alphabetically/lexicographically sort an array of latin-based string, I'm looking for a way to sort non-latin UTF-8 strings.

Specifically:

And by "sorting", I mean the very same way you would normally find them in a dictionary. (I know it can be a lot trickier for chinese/japanese, so let's just stick to the rest of them first)

Any ideas?


P.S. I'm not interested in transliteration (that's what I'm currently doing), as the results are very far from "correct" - lexicographically speaking...


Note: It's not RoR-related. Just pure Ruby.

Upvotes: 0

Views: 927

Answers (1)

Frederick Cheung
Frederick Cheung

Reputation: 84182

As you note, Unicode collation is tricky stuff - you almost certainly don't want to be doing it yourself.

The daddy of Unicode handling library is icu. There are quite a lot of ruby bindings for icu, many of which look rather old, but ffi-icu seems reasonably active.

Twitter also maintain twitter-cldr-rb which claims to have a pure ruby full implementation of the Unicode collation algorithm.

Upvotes: 5

Related Questions