Rohaq
Rohaq

Reputation: 2046

Odd Perl 'dictionary' sort behaviour

I'm attempting to implement the Tcl dictionary sort in Perl in order to order some files. For those who don't know Tcl, you can get it to sort continuous integers by their value, it's detailed here:

http://www.perlmonks.org/index.pl?node_id=160157

To summarise: A given array of:

qw(
  bigbang
  x10y
  x9y
  bigboy
  bigBoy
  x11y
)

is sorted by case-insensitive letter, then case sensitive as a tie-breaker, then by the number, except it takes any following numbers and interprets the entire thing as a single number in the sort, so the above comes out as:

qw(
    bigbang
    bigBoy
    bigboy
    x9y
    x10y
    x11y
)

With x9y appearing above x10y and x11y, whilst in a standard ASCII sort, x10y and x11y would come above x9y, due to 1 appearing before 9.

I attempted to implement Juerd's example in that link as a function, but in my case, the sort perfectly mimics a Tcl dictionary sort when I have a list of version numbers, like the following:

qw{ 1 1.0 1.01 1.2 1.02 1.0003 1.102 1.103 1.203 102a 102b 103a 103b 123 };

But when absolute paths are used for the files, the ordering messes up.

I've posted an example script below. If anyone can see why the function is going wrong, or if you can suggest an more modern alternative (since the example I worked from was posted 10 years ago :P), I would appreciate it.

http://pastebin.com/WM6QhzSK

And if you want to see a Tcl dictionary sort in action, check the link below:

http://pastebin.com/h3qMT4C2

Thanks in advance!

EDIT:- Thanks to choroba for leading me to the solution! The working function is as follows:

sub dict_sort {
  my @unsorted = @_;
  my @sorted =
    map $_->[0],
    sort {
      my $i = 0;
      {
        my $A = $a->[1][$i];
        my $B = $b->[1][$i];
        defined($A) || defined($B)       # Stop if both undef
        and (
          defined($A) <=> defined($B)  # Defined wins over undef
          or (
            $A !~ /\d/ || $B !~ /\d/ # $A or $B is non-integer
            ?    (lc $A cmp lc $B)   # ?? Stringy lowercase
              || (   $A cmp    $B)   #    -> Tie breaker
            : $A <=> $B              # :: $A and $B are integers
              or (
                length($A) <=> length($B)  # If numeric comparison returns the same, check length to sort by leading zeroes
              )
          )
          or ++$i && redo              # tie => next part
        );
      }
    }
  map [ $_, [ split /(\d+)/ ] ], @unsorted;
  return @sorted;
}

Upvotes: 3

Views: 371

Answers (1)

choroba
choroba

Reputation: 241918

Your code does not work differently for version strings. Just add 9.02 9.2 to the list in this order. If you want 02 to come after 2, you have to inspect the case when $A == $B.

Update: It means adding or length $A <=> length $B after the $A <=> $B.

Upvotes: 3

Related Questions