jetgerbil
jetgerbil

Reputation: 41

Sorting a CSV file with Powershell

I'm trying to sort a large CSV file. The issue is with using sort-oject on a coloum called Combined. It doesn't sort the way I'd expect it to! For testing, I have removed all of the data that isn't being sorted and also reduced the ammount of lines.

Orginal CSV:

Combined
1A THE BIG
7
3A SPRING
19
LUZREN
21
23
25
29
1 HONEY
5
3 THE GOOD
11
ARVALA
BRASAID
13
MEADOWCLAW

Powershell:

Import-Csv orginal.csv -delimiter ',' | Sort-Object -Unique Combined 

Combined
--------
1 HONEY
11
13
19
1A THE BIG
21
23
25
29
3 THE GOOD
3A SPRING
5
7
ARVALA
BRASAID
LUZREN
MEADOWCLAW

What I expected was this:

Combined
1A THE BIG
5
7
11
13
19
21
23
25
29
1 HONEY
3 THE GOOD
3A SPRING
ARVALA
BRASAID
LUZREN
MEADOWCLAW

In a perfect world I'd like this:

Combined
1A THE BIG
1 HONEY
3 THE GOOD
3A SPRING
5
7
11
13
19
21
23
25
29
ARVALA
BRASAID
LUZREN
MEADOWCLAW

Please can someone explain to me why this is happening I've banging my head againt my monitor for a week now. Also, how could I get the output that I'd like?

Thanks in adance!

Upvotes: 4

Views: 5589

Answers (3)

CB.
CB.

Reputation: 60918

Try this: (not in the perfect world )

Import-Csv original.csv -delimiter ',' | 
Sort-Object { [int]([regex]::Replace( $_.combined , "\D" , "" )) } ,  `
{ [regex]::Replace( $_.combined , "\d" , "" ) } -unique

Combined
--------
ARVALA
BRASAID
LUZREN
MEADOWCLAW
1 HONEY
1A THE BIG
3 THE GOOD
3A SPRING
5
7
11
13
19
21
23
25
29

Following the nice answer of @Aryadev this code sort also value starting with more than one digit:

$allToNumbers = { [int]([regex]::Replace( $_.combined , "\D" , "" )) }
$StartsWithNumber = { if ($_.Combined -match '^\d+\D') { $_.Combined } }
$IsNumber = { if ($_.Combined -match '^\d*$') { [int]$_.Combined } }
$OnlyLetters = { if ($_.Combined -imatch '^[a-z ]*$') { $_.Combined } }

Import-Csv original.csv | 
Sort-Object  $OnlyLetters, $allnumbers, $IsNumber,  $StartsWithNumber -Unique

Upvotes: 1

Alexey A.
Alexey A.

Reputation: 902

Not sure I understand the sorting logic, but try this:

$StartsWithNumber = { if ($_.Combined -match '^\d*\D' ) { $_.Combined } }
$IsNumber = { if ($_.Combined -match '^\d*$') { [int]$_.Combined } }
$OnlyLetters = { if ($_.Combined -imatch '^[a-z ]*$') { $_.Combined } }

Import-Csv original.csv | 
Sort-Object $OnlyLetters, $IsNumber, $StartsWithNumber -Unique

Combined
--------
1 HONEY
1A THE BIG
3 THE GOOD
3A SPRING
5
7
11
13
19
21
23
25
29
ARVALA
BRASAID
LUZREN
MEADOWCLAW

Upvotes: 4

Frode F.
Frode F.

Reputation: 54881

Import-CSV imports all values as string by default. That's why you get the result you see. It's sorted alphabetical. The "priority" in alphanumeric is: spaces, numbers, letters.

I don't think you can get the output you want without some serious code. My only suggestion is parsing the pure number to int so ex. a number 30 will come after 5. This can be done line this:

$intvalue = 10000
import-csv .\test.csv | % {
    if ([int]::TryParse($_.Combined, [ref]$intvalue)) {
        $_.Combined = $intvalue
    }
    $_
} | Sort-Object -Unique Combined

Combined
--------
1 HONEY
5
7
11
13
19
1A THE BIG
21
23
25
29
3 THE GOOD
3A SPRING
ARVALA
BRASAID
LUZREN 
MEADOWCLAW 

Upvotes: 0

Related Questions