Reputation: 1
I need to sort the data in a file. Sort order is Column 7,2. The last column (Column 8) is null:
1|1|1|1|1|1|12333|
3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123300000|
When I use the following command I get a strange value in the output file:
sort -o /test1/FILE2 -T /test1/Junk -t\| -k7,7 -k2,2 /test1/Junk/FILE2_1
Where
/test1/FILE2
is the input file/test1/Junk
is the temporary directory/test1/Junk/FILE2_1
is the output fileValues in the output file
1|1|1|1|1|1|123300000|
1|1|1|1|1|1|12333|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|
Any idea why the row containing 123300000 is coming up first?
I need the sorting like below:
1|1|1|1|1|1|12333|
1|1|1|1|1|1|123300000|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|
Upvotes: 0
Views: 1732
Reputation: 41
The Ordering is done lexicographical as you said. Your command is almost correct but use n in sort command, like,
3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123330000|' | sort -t \| -nk7,7 -nk2,2
This will sort the data numerically.
Upvotes: 0
Reputation: 882466
Normally, you choose either numeric or lexicographical (dictionary) ordering.
If you wanted those values sorted numerically, you would need a -n
in your sort
command:
pax> echo '1|1|1|1|1|1|12333|
3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123300000|' | sort -t \| -k7,7 -k2,2 -n
1|1|1|1|1|1|12333|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|
1|1|1|1|1|1|123300000|
If, on the other hand, you don't want it sorted numerically, then the output you have is already correct as far as I can see:
v
1|1|1|1|1|1|123300000|
1|1|1|1|1|1|12333|
^
Note the highlighted characters. Since 0
comes before 3
, this is the right lexicographical order.
Changing that large value to 123330000
results in the order you seem to be after:
pax> echo '1|1|1|1|1|1|12333|
3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123330000|' | sort -t \| -k7,7 -k2,2
1|1|1|1|1|1|12333|
1|1|1|1|1|1|123330000|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|
Hence I suspect you're just misreading the data in this case.
If, as you state in a comment, the test data was incorrect, the presence or absence of the final |
character should make no difference to the sort order. First, lexicographical sorting with and without |
:
pax> echo ; echo '1|1|1|1|1|1|12333|
3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123330000|' | sort -t \| -k7,7 -k2,2
1|1|1|1|1|1|12333|
1|1|1|1|1|1|123330000|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|
pax> echo ; echo '1|1|1|1|1|1|12333
3|3|3|3|3|3|44454
2|2|2|2|2|2|22222
1|1|1|1|1|1|123330000' | sort -t \| -k7,7 -k2,2
1|1|1|1|1|1|12333
1|1|1|1|1|1|123330000
2|2|2|2|2|2|22222
3|3|3|3|3|3|44454
You can see there that 123330000
is second in both cases.
Similarly, for numerical sorting with and without |
, the larger number appears at the end:
pax> echo ; echo '1|1|1|1|1|1|12333|
3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123330000|' | sort -t \| -k7,7 -k2,2 -n
1|1|1|1|1|1|12333|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|
1|1|1|1|1|1|123330000|
pax> echo ; echo '1|1|1|1|1|1|12333
3|3|3|3|3|3|44454
2|2|2|2|2|2|22222
1|1|1|1|1|1|123330000' | sort -t \| -k7,7 -k2,2 -n
1|1|1|1|1|1|12333
2|2|2|2|2|2|22222
3|3|3|3|3|3|44454
1|1|1|1|1|1|123330000
If you're seeing something else then either your sort is broken or it's configured strangely. You might want to investigate, if that's the case, whether you have a sort
function or alias overriding the real one (with which sort
, for example), or whether you have a bizarre LC_ALL
setting, which affects the comparison function used for sorting.
With GNU sort, at least, you can also use --debug
to annotate the output, indicating which line portions are used as keys.
And, finally, one other possibility may be the presence of non-printing characters in your input that may be affecting sort order. You can detect these by getting a hex dump of the file and checking it:
od -xcb /test1/Junk/FILE2_1
Upvotes: 4