Ramon
Ramon

Reputation: 21

Sorting Issue in Linux

I have a list of files like this in linux:

440c0402 mfcc.2.ark:15681
440c0401 mfcc.1.ark:501177
440c0401 mfcc.1.ark:9
440c0403 mfcc.3.ark:516849

When I try to sort them using sort command in Linux I get:

440c0401 mfcc.1.ark:501177
440c0401 mfcc.1.ark:9
440c0402 mfcc.2.ark:15681
440c0403 mfcc.3.ark:516849

The first and second line should be reverse because 501177>9. It happens in different locations because it is a large list. Does anybody have an idea how I can resolve this problem?

Upvotes: 1

Views: 86

Answers (5)

Ramon
Ramon

Reputation: 21

Thanks everybody. I tried this and worked for me:

sort -t ':' -k1,1 -k 2n,2 <filename>

Upvotes: 0

karakfa
karakfa

Reputation: 67467

simpler version

$ sort -t: -k1,1 -k2n file

440c0401 mfcc.1.ark:9
440c0401 mfcc.1.ark:501177
440c0402 mfcc.2.ark:15681
440c0403 mfcc.3.ark:516849

for fixed length fields numerical or lexical sorting will behave the same, for variable length numbers it's different (there are no leading zeros!!).

this splits the line into two by ":", first part fixed length, so no special care is needed, but for the second part you have to add n suffix to indicate numerical sorting.

Upvotes: 3

0xbadc0de
0xbadc0de

Reputation: 3925

This is what you want to use:

sort -t: -k 1.5,1.8n -k 2.1,2.7n inputfile 440c0401 mfcc.1.ark:9 440c0401 mfcc.1.ark:501177 440c0402 mfcc.2.ark:15681 440c0403 mfcc.3.ark:516849

-t -separator of fields(should not be used elsewhere in the inputfile

-k is key for sorting(could be used > 1) -k 1.5,1.8n means: sort numerical(this tells the n) by the field 1 from 5th to 8th character.

second -k tells sort the field 2 from the first to the 7th character numerically.

Upvotes: 1

ghoti
ghoti

Reputation: 46816

This works for me:

$ sort -n -t: -k2 inputfile
440c0401 mfcc.1.ark:9
440c0402 mfcc.2.ark:15681
440c0401 mfcc.1.ark:501177
440c0403 mfcc.3.ark:516849

The options I'm using are:

  • -n sorts numerically,
  • -t: sets a field separator of :, and
  • -k2 tell sort to consider the second field of each line.

It goes without saying (but I'll say it anyway) that this depends on your lines following the format in your sample data.

I'm doing this in FreeBSD, but I believe the sort options I'm using are portable to Linux.

Upvotes: 2

itsols
itsols

Reputation: 5582

File names are string type data; not numeric.

Therefore, when you sort them, the characters get compared starting from the first character position onwards.

Therefore, in the file 501177, 5 is compared against 9.

But what you're thinking is numeric values.

If you want to compare them numerically, you'll need to extract that part out, cast it (depending on your programming language) and compare them.

Oh, and by the way, this is not a Linux-specific issue. Rather it's the way computers process strings by default.

Upvotes: 2

Related Questions