Reputation: 22820
OK, so what I need is rather self-explanatory.
The same way .sort
is used, in order to alphabetically/lexicographically sort an array of latin-based string, I'm looking for a way to sort non-latin UTF-8 strings.
Specifically:
And by "sorting", I mean the very same way you would normally find them in a dictionary. (I know it can be a lot trickier for chinese/japanese, so let's just stick to the rest of them first)
Any ideas?
P.S. I'm not interested in transliteration (that's what I'm currently doing), as the results are very far from "correct" - lexicographically speaking...
Note: It's not RoR-related. Just pure Ruby.
Upvotes: 0
Views: 927
Reputation: 84182
As you note, Unicode collation is tricky stuff - you almost certainly don't want to be doing it yourself.
The daddy of Unicode handling library is icu. There are quite a lot of ruby bindings for icu, many of which look rather old, but ffi-icu seems reasonably active.
Twitter also maintain twitter-cldr-rb which claims to have a pure ruby full implementation of the Unicode collation algorithm.
Upvotes: 5