ubuntuquestions
ubuntuquestions

Reputation: 45

Bash sort by character position in column

I would like to sort the below file by the 2nd column, from the 7th position to the 9th position.

$ cat sample.bed
chr1    248956422       chr1:248956422
chr2    242193529       chr2:242193529
chr3    198295559       chr3:198295559
chr4    190214555       chr4:190214555
chr5    181538259       chr5:181538259
chr6    170805979       chr6:170805979
chr7    159345973       chr7:159345973
chrX    156040895       chrX:156040895
chr8    145138636       chr8:145138636
chr9    138394717       chr9:138394717

I use sort as shown and get the below output:

$ sort -n -k2.7,2.9 sample.bed
chr4    190214555       chr4:190214555
chr6    170805979       chr6:170805979
chr5    181538259       chr5:181538259
chr2    242193529       chr2:242193529
chr8    145138636       chr8:145138636
chrX    156040895       chrX:156040895
chr3    198295559       chr3:198295559
chr9    138394717       chr9:138394717
chr1    248956422       chr1:248956422
chr7    159345973       chr7:159345973

Sort changes the row order, but not based on my parameters. Note that sort -k2,2 works as expected:

$ sort -k2,2 sample.bed
chr9    138394717       chr9:138394717
chr8    145138636       chr8:145138636
chrX    156040895       chrX:156040895
chr7    159345973       chr7:159345973
chr6    170805979       chr6:170805979
chr5    181538259       chr5:181538259
chr4    190214555       chr4:190214555
chr3    198295559       chr3:198295559
chr2    242193529       chr2:242193529
chr1    248956422       chr1:248956422

I must be missing something obvious... Any help would be greatly appreciated.

Upvotes: 2

Views: 4070

Answers (1)

KamilCuk
KamilCuk

Reputation: 140960

The output of sort --debug is very informative:

# sort -n -k2.7,2.9 --debug
...
chr4    190214555       chr4:190214555
          ___
______________________________________
...

It compares 021 from the first chr4 line, because it counts the leading blanks as belonging to the field. You can:

sort -n -k2.11,2.13

or ignore leading blanks with -b:

sort -b -n -k2.7,2.9

Upvotes: 5

Related Questions