Reputation: 41
I'm trying to sort a large CSV file. The issue is with using sort-oject on a coloum called Combined. It doesn't sort the way I'd expect it to! For testing, I have removed all of the data that isn't being sorted and also reduced the ammount of lines.
Orginal CSV:
Combined
1A THE BIG
7
3A SPRING
19
LUZREN
21
23
25
29
1 HONEY
5
3 THE GOOD
11
ARVALA
BRASAID
13
MEADOWCLAW
Powershell:
Import-Csv orginal.csv -delimiter ',' | Sort-Object -Unique Combined
Combined
--------
1 HONEY
11
13
19
1A THE BIG
21
23
25
29
3 THE GOOD
3A SPRING
5
7
ARVALA
BRASAID
LUZREN
MEADOWCLAW
What I expected was this:
Combined
1A THE BIG
5
7
11
13
19
21
23
25
29
1 HONEY
3 THE GOOD
3A SPRING
ARVALA
BRASAID
LUZREN
MEADOWCLAW
In a perfect world I'd like this:
Combined
1A THE BIG
1 HONEY
3 THE GOOD
3A SPRING
5
7
11
13
19
21
23
25
29
ARVALA
BRASAID
LUZREN
MEADOWCLAW
Please can someone explain to me why this is happening I've banging my head againt my monitor for a week now. Also, how could I get the output that I'd like?
Thanks in adance!
Upvotes: 4
Views: 5589
Reputation: 60918
Try this: (not in the perfect world )
Import-Csv original.csv -delimiter ',' |
Sort-Object { [int]([regex]::Replace( $_.combined , "\D" , "" )) } , `
{ [regex]::Replace( $_.combined , "\d" , "" ) } -unique
Combined
--------
ARVALA
BRASAID
LUZREN
MEADOWCLAW
1 HONEY
1A THE BIG
3 THE GOOD
3A SPRING
5
7
11
13
19
21
23
25
29
Following the nice answer of @Aryadev this code sort also value starting with more than one digit:
$allToNumbers = { [int]([regex]::Replace( $_.combined , "\D" , "" )) }
$StartsWithNumber = { if ($_.Combined -match '^\d+\D') { $_.Combined } }
$IsNumber = { if ($_.Combined -match '^\d*$') { [int]$_.Combined } }
$OnlyLetters = { if ($_.Combined -imatch '^[a-z ]*$') { $_.Combined } }
Import-Csv original.csv |
Sort-Object $OnlyLetters, $allnumbers, $IsNumber, $StartsWithNumber -Unique
Upvotes: 1
Reputation: 902
Not sure I understand the sorting logic, but try this:
$StartsWithNumber = { if ($_.Combined -match '^\d*\D' ) { $_.Combined } }
$IsNumber = { if ($_.Combined -match '^\d*$') { [int]$_.Combined } }
$OnlyLetters = { if ($_.Combined -imatch '^[a-z ]*$') { $_.Combined } }
Import-Csv original.csv |
Sort-Object $OnlyLetters, $IsNumber, $StartsWithNumber -Unique
Combined
--------
1 HONEY
1A THE BIG
3 THE GOOD
3A SPRING
5
7
11
13
19
21
23
25
29
ARVALA
BRASAID
LUZREN
MEADOWCLAW
Upvotes: 4
Reputation: 54881
Import-CSV
imports all values as string
by default. That's why you get the result you see. It's sorted alphabetical. The "priority" in alphanumeric is: spaces, numbers, letters.
I don't think you can get the output you want without some serious code. My only suggestion is parsing the pure number to int so ex. a number 30 will come after 5. This can be done line this:
$intvalue = 10000
import-csv .\test.csv | % {
if ([int]::TryParse($_.Combined, [ref]$intvalue)) {
$_.Combined = $intvalue
}
$_
} | Sort-Object -Unique Combined
Combined
--------
1 HONEY
5
7
11
13
19
1A THE BIG
21
23
25
29
3 THE GOOD
3A SPRING
ARVALA
BRASAID
LUZREN
MEADOWCLAW
Upvotes: 0