Jesvin Jose
Jesvin Jose

Reputation: 23098

Simplest way to do "natural" sort for numbers in strings?

I just saw this and I am wondering what is the best way to implement natural sorting like this?

Usually in a list of 1,4,11,12 the string sort (used in listing items) returns 1,11,12,4. How do I implement a natural ordering?

enter image description here

Upvotes: 1

Views: 489

Answers (3)

dan-boa
dan-boa

Reputation: 620

The items can also be

[ 'screen 4 episode 13', 'screen 11 episode 1', .... ]

For the above mentioned list and the sample list provided in the question, following method can be used:

  1. Convert the number in the elements to a bucket based system, i.e calculate the maximum of the digit in any of the string. For example in the above list, the value is 2. Now convert the numbers in the elements in such a way that its length turns out to be of length as same as the maximum. Therefore,'screen 4 episode 13' will be converted to 'screen 04 episode 13' and 'screen 11 episode 1' will be converted to 'screen 11 episode 01'.

  2. Now sort the modified list like a string.

Upvotes: 2

Steve Jessop
Steve Jessop

Reputation: 279315

You could split each string into a sequence of tokens. A token consists either of all non-digits or all digits. Then do a comparison on the sequence of tokens, instead of on the sequence of characters in the string. Non-digit tokens compare like strings, all-digit tokens compare against each other using their integer values.

It's up to you how all-digit tokens compare against non-digit tokens, but most likely you want foo123.txt to appear after foo.txt but before fooA.txt. That means that when you compare the token foo against the token foo<something>, you don't produce an answer immediately based only on those two tokens -- you need to compare <something> against the token following foo.

That basic approach could then be optimized, to ensure you don't do any more string-splitting than is strictly necessary.

Upvotes: 1

salva
salva

Reputation: 10242

An efficient solution is to generate for every string to be sorted a key that can be compared lexicographically, and then use theses keys to sort the original strings.

To generate these keys, start with a copy of the original strings and then transform the substrings representing numbers as follow:

  • Remove any leading zeros.
  • Preppend the digits substring length encoded as the digit '9' repeated int(len/9) times followed by the digit representing len % 9.

For instance:

1          -> 11
10         -> 210
9          -> 19
12345678   -> 812345678
987654321  -> 90987654321    // len = 9, int(len / 9) = 1, len % 9 = 0
9876543210 -> 919876543210   // len = 10, int(len / 9) = 1, len % 9 = 1

You may also like to replace punctuation signs on the keys in order to have "Foo 123" and "Foo-123" compare equally.

The Perl module Sort::Key::Natural uses this approach.

Upvotes: 1

Related Questions