Reputation: 1185
Does it exist some established convention of sorting lines (characters)? Some convention which should play the similar role as PCRE for regular expressions.
For example, if you try to sort 0A1b-a2_B
(each character on its own line) with Sublime Text (Ctrl-F9) and Vim (:%sort
), the result will be the same (see below). However, I'm not sure it will be the same with another editors and IDEs.
-
0
1
2
A
B
_
a
b
Upvotes: 1
Views: 147
Reputation: 20812
There are two main ways of sorting character strings:
Lexicographic: numeric value of either the codepoint values or the code unit values or the serialized code unit values (bytes). For some character encodings, they would all be the same. The algorithm is very simple but this method is not human-friendly.
Culture/Locale-specific: an ordinal database for each supported culture is used. For the Unicode character set, it's called the CLDR. Also, in applying sorting for Unicode, sorting can respect grapheme clusters. A grapheme cluster is a base codepoint followed by a sequence of zero or more non-spacing (applied as extensions of the previous glyph) marks.
For some older character sets with one encoding, designed for only one or two scripts, the two methods might amount to the same thing.
Upvotes: 1
Reputation: 5780
Generally, characters are sorted based on their numeric value. While this used to only be applied to ASCII characters, this has also been adopted by unicode encodings as well. http://www.asciitable.com/
If no preference is given to the contrary, this is the de facto standard for sorting characters. Save for the actual alphabetical characters, the ordering is somewhat arbitrary.
Upvotes: 1