Reputation: 21
I have a list of files like this in linux:
440c0402 mfcc.2.ark:15681
440c0401 mfcc.1.ark:501177
440c0401 mfcc.1.ark:9
440c0403 mfcc.3.ark:516849
When I try to sort them using sort command in Linux I get:
440c0401 mfcc.1.ark:501177
440c0401 mfcc.1.ark:9
440c0402 mfcc.2.ark:15681
440c0403 mfcc.3.ark:516849
The first and second line should be reverse because 501177>9. It happens in different locations because it is a large list. Does anybody have an idea how I can resolve this problem?
Upvotes: 1
Views: 86
Reputation: 21
Thanks everybody. I tried this and worked for me:
sort -t ':' -k1,1 -k 2n,2 <filename>
Upvotes: 0
Reputation: 67467
simpler version
$ sort -t: -k1,1 -k2n file
440c0401 mfcc.1.ark:9
440c0401 mfcc.1.ark:501177
440c0402 mfcc.2.ark:15681
440c0403 mfcc.3.ark:516849
for fixed length fields numerical or lexical sorting will behave the same, for variable length numbers it's different (there are no leading zeros!!).
this splits the line into two by ":", first part fixed length, so no special care is needed, but for the second part you have to add n
suffix to indicate numerical sorting.
Upvotes: 3
Reputation: 3925
This is what you want to use:
sort -t: -k 1.5,1.8n -k 2.1,2.7n inputfile
440c0401 mfcc.1.ark:9
440c0401 mfcc.1.ark:501177
440c0402 mfcc.2.ark:15681
440c0403 mfcc.3.ark:516849
-t -separator of fields(should not be used elsewhere in the inputfile
-k
is key for sorting(could be used > 1)
-k 1.5,1.8n
means: sort numerical(this tells the n
) by the field 1 from 5th to 8th character.
second -k
tells sort the field 2 from the first to the 7th character numerically.
Upvotes: 1
Reputation: 46816
This works for me:
$ sort -n -t: -k2 inputfile
440c0401 mfcc.1.ark:9
440c0402 mfcc.2.ark:15681
440c0401 mfcc.1.ark:501177
440c0403 mfcc.3.ark:516849
The options I'm using are:
-n
sorts numerically,-t:
sets a field separator of :
, and-k2
tell sort to consider the second field of each line.It goes without saying (but I'll say it anyway) that this depends on your lines following the format in your sample data.
I'm doing this in FreeBSD, but I believe the sort options I'm using are portable to Linux.
Upvotes: 2
Reputation: 5582
File names are string type data; not numeric.
Therefore, when you sort them, the characters get compared starting from the first character position onwards.
Therefore, in the file 501177, 5 is compared against 9.
But what you're thinking is numeric values.
If you want to compare them numerically, you'll need to extract that part out, cast it (depending on your programming language) and compare them.
Oh, and by the way, this is not a Linux-specific issue. Rather it's the way computers process strings by default.
Upvotes: 2