Aravindh
Aravindh

Reputation: 1

UNIX Sort Command

I need to sort the data in a file. Sort order is Column 7,2. The last column (Column 8) is null:

    1|1|1|1|1|1|12333|    
    3|3|3|3|3|3|44454|    
    2|2|2|2|2|2|22222|    
    1|1|1|1|1|1|123300000|    

When I use the following command I get a strange value in the output file:

sort -o /test1/FILE2 -T /test1/Junk -t\| -k7,7 -k2,2 /test1/Junk/FILE2_1  

Where

Values in the output file

    1|1|1|1|1|1|123300000|    
    1|1|1|1|1|1|12333|    
    2|2|2|2|2|2|22222|    
    3|3|3|3|3|3|44454|    

Any idea why the row containing 123300000 is coming up first?

I need the sorting like below:

    1|1|1|1|1|1|12333|    
    1|1|1|1|1|1|123300000|    
    2|2|2|2|2|2|22222|    
    3|3|3|3|3|3|44454|    

Upvotes: 0

Views: 1732

Answers (2)

user13608932
user13608932

Reputation: 41

The Ordering is done lexicographical as you said. Your command is almost correct but use n in sort command, like,

3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123330000|' | sort -t \| -nk7,7 -nk2,2

This will sort the data numerically.

Upvotes: 0

paxdiablo
paxdiablo

Reputation: 882466

Normally, you choose either numeric or lexicographical (dictionary) ordering.

If you wanted those values sorted numerically, you would need a -n in your sort command:

pax> echo '1|1|1|1|1|1|12333|    
3|3|3|3|3|3|44454|    
2|2|2|2|2|2|22222|    
1|1|1|1|1|1|123300000|' | sort -t \| -k7,7 -k2,2 -n

1|1|1|1|1|1|12333|    
2|2|2|2|2|2|22222|    
3|3|3|3|3|3|44454|    
1|1|1|1|1|1|123300000|

If, on the other hand, you don't want it sorted numerically, then the output you have is already correct as far as I can see:

                v
1|1|1|1|1|1|123300000|    
1|1|1|1|1|1|12333|    
                ^

Note the highlighted characters. Since 0 comes before 3, this is the right lexicographical order.

Changing that large value to 123330000 results in the order you seem to be after:

pax> echo '1|1|1|1|1|1|12333|    
3|3|3|3|3|3|44454|    
2|2|2|2|2|2|22222|    
1|1|1|1|1|1|123330000|' | sort -t \| -k7,7 -k2,2

1|1|1|1|1|1|12333|    
1|1|1|1|1|1|123330000|
2|2|2|2|2|2|22222|    
3|3|3|3|3|3|44454|   

Hence I suspect you're just misreading the data in this case.


If, as you state in a comment, the test data was incorrect, the presence or absence of the final | character should make no difference to the sort order. First, lexicographical sorting with and without |:

pax> echo ; echo '1|1|1|1|1|1|12333|
3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123330000|' | sort -t \| -k7,7 -k2,2

1|1|1|1|1|1|12333|
1|1|1|1|1|1|123330000|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|

pax> echo ; echo '1|1|1|1|1|1|12333
3|3|3|3|3|3|44454
2|2|2|2|2|2|22222
1|1|1|1|1|1|123330000' | sort -t \| -k7,7 -k2,2

1|1|1|1|1|1|12333
1|1|1|1|1|1|123330000
2|2|2|2|2|2|22222
3|3|3|3|3|3|44454

You can see there that 123330000 is second in both cases.

Similarly, for numerical sorting with and without |, the larger number appears at the end:

pax> echo ; echo '1|1|1|1|1|1|12333| 
3|3|3|3|3|3|44454| 
2|2|2|2|2|2|22222| 
1|1|1|1|1|1|123330000|' | sort -t \| -k7,7 -k2,2 -n

1|1|1|1|1|1|12333|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|
1|1|1|1|1|1|123330000|

pax> echo ; echo '1|1|1|1|1|1|12333 
3|3|3|3|3|3|44454 
2|2|2|2|2|2|22222 
1|1|1|1|1|1|123330000' | sort -t \| -k7,7 -k2,2 -n

1|1|1|1|1|1|12333
2|2|2|2|2|2|22222
3|3|3|3|3|3|44454
1|1|1|1|1|1|123330000

If you're seeing something else then either your sort is broken or it's configured strangely. You might want to investigate, if that's the case, whether you have a sort function or alias overriding the real one (with which sort, for example), or whether you have a bizarre LC_ALL setting, which affects the comparison function used for sorting.

With GNU sort, at least, you can also use --debug to annotate the output, indicating which line portions are used as keys.

And, finally, one other possibility may be the presence of non-printing characters in your input that may be affecting sort order. You can detect these by getting a hex dump of the file and checking it:

od -xcb /test1/Junk/FILE2_1

Upvotes: 4

Related Questions