Reputation: 81
I'm running
'S-tst','ssrst','srst2','s-zaa','s-a','s-zf' | Sort-Object
Shouldn't I have gotten a return of
s-a
S-tst
s-zaa
s-zf
srst2
ssrst
but instead I get the following:
s-a
srst2
ssrst
S-tst
s-zaa
s-zf
How is this possible ? Does sort-object only look at letters when sorting out ? Is there any way to sort it out by special characters ?
Upvotes: 7
Views: 3151
Reputation: 399
You can achieve ASCII-style order by sorting string hex representation:
'S-tst','ssrst','srst2','s-zaa','s-a','s-zf' | Sort-Object {Format-Hex -InputObject $_}
In case you need it case insensitive you can lowercase is first:
'S-tst','ssrst','srst2','s-zaa','s-a','s-zf' | Sort-Object {Format-Hex -InputObject $_.ToLower()}
Upvotes: 0
Reputation: 24410
This behaviour is by design, but not always what people want/expect. If you want strings sorted with each character in ASCII order use this:
Add-Type @"
using System;
using System.Collections;
using System.Collections.Generic;
using System.Globalization;
public class SimpleStringComparer: IComparer, IComparer<string>
{
private static readonly CompareInfo compareInfo = CompareInfo.GetCompareInfo(CultureInfo.InvariantCulture.Name);
public int Compare(object x, object y)
{
return Compare(x as string, y as string);
}
public int Compare(string x, string y)
{
return compareInfo.Compare(x, y, CompareOptions.OrdinalIgnoreCase);
}
public SimpleStringComparer() {}
}
"@
[string[]]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'
[System.Collections.Generic.List[string]]$list = [System.Collections.Generic.List[string]]::new()
$list.AddRange($myList)
[SimpleStringComparer]$comparer = [SimpleStringComparer]::new()
$list.Sort([SimpleStringComparer]::new())
$list
Outputs:
s'a
S'a
s'a1
S'a1
s-a
S-a
s-a1
S-a1
sa
Sa
sa1
Sa1
s^a
S^a
More Info
Per @TessellatingHeckler in the comments, you can sort strings in character code (ordinal) order by casting the string to a char array. However, that still handles hyphens and apostrophes in a potentially unexpected way (as these characters are ignored):
$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'
$myList | Sort-Object -Property { [char[]] $_ }
s'a
S'a
s-a
S-a
s'a1
S'a1
s-a1
S-a1
s^a
S^a
sa
Sa
sa1
Sa1
The current sorting behaviour is by design. It appears that PowerShell implements a "Word Sort". This is documented here: https://msdn.microsoft.com/en-us/library/windows/desktop/dd318144(v=vs.85).aspx#SortingFunctions
In addition to ignoring hyphens and apostrophes (except when comparing otherwise identical strings), this sort also treats punctuation characters as coming before alphanumerics, and handles accented letters alongside their counterparts. A simple demo of this can be seen like so:
32..255 | %{[string][char][byte]$_} | sort
To define other sorting behaviours, currently you'd likely need to dip into .Net, like so:
Add-Type @"
using System;
using System.Runtime.InteropServices;
using System.Collections;
public class NumericStringComparer: IComparer
{
//https://msdn.microsoft.com/en-us/library/windows/desktop/bb759947%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
[DllImport("shlwapi.dll")]
public static extern int StrCmpLogicalW(string psz1, string psz2);
public int Compare(object x, object y)
{
return Compare(x as string, y as string);
}
public int Compare(string x, string y)
{
return StrCmpLogicalW(x, y);
}
public NumericStringComparer() {}
}
"@
[System.Collections.ArrayList]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a', , '100a','1a','001a','2a','20a'
$myList.Sort([NumericStringComparer]::new())
$myList -join ', '
The above sorts strings the way Windows Explorer would (i.e. treating leading digits as numeric values):
s'a, s'a1, S'a, s-a, S-a, S-a1, S'a1, s-a1, S^a, s^a, 1a, 001a, 2a, Sa, Sa1, sa, sa1, 20a, 100a
I've submitted a feature suggestion to provide more PS friendly solutions on Sort-Object
. See https://github.com/PowerShell/PowerShell/issues/4098
Upvotes: 5