Reputation: 23098
I just saw this and I am wondering what is the best way to implement natural sorting like this?
Usually in a list of 1,4,11,12
the string sort (used in listing items) returns 1,11,12,4
. How do I implement a natural ordering?
Upvotes: 1
Views: 489
Reputation: 620
The items can also be
[ 'screen 4 episode 13', 'screen 11 episode 1', .... ]
For the above mentioned list and the sample list provided in the question, following method can be used:
Convert the number in the elements to a bucket based system, i.e calculate the maximum of the digit in any of the string. For example in the above list, the value is 2. Now convert the numbers in the elements in such a way that its length turns out to be of length as same as the maximum. Therefore,'screen 4 episode 13' will be converted to 'screen 04 episode 13' and 'screen 11 episode 1' will be converted to 'screen 11 episode 01'.
Now sort the modified list like a string.
Upvotes: 2
Reputation: 279315
You could split each string into a sequence of tokens. A token consists either of all non-digits or all digits. Then do a comparison on the sequence of tokens, instead of on the sequence of characters in the string. Non-digit tokens compare like strings, all-digit tokens compare against each other using their integer values.
It's up to you how all-digit tokens compare against non-digit tokens, but most likely you want foo123.txt
to appear after foo.txt
but before fooA.txt
. That means that when you compare the token foo
against the token foo<something>
, you don't produce an answer immediately based only on those two tokens -- you need to compare <something>
against the token following foo
.
That basic approach could then be optimized, to ensure you don't do any more string-splitting than is strictly necessary.
Upvotes: 1
Reputation: 10242
An efficient solution is to generate for every string to be sorted a key that can be compared lexicographically, and then use theses keys to sort the original strings.
To generate these keys, start with a copy of the original strings and then transform the substrings representing numbers as follow:
For instance:
1 -> 11
10 -> 210
9 -> 19
12345678 -> 812345678
987654321 -> 90987654321 // len = 9, int(len / 9) = 1, len % 9 = 0
9876543210 -> 919876543210 // len = 10, int(len / 9) = 1, len % 9 = 1
You may also like to replace punctuation signs on the keys in order to have "Foo 123" and "Foo-123" compare equally.
The Perl module Sort::Key::Natural uses this approach.
Upvotes: 1