neversaint
neversaint

Reputation: 63994

How To Sort Tab Format File Based on Length of Column K

I have a space delimited tabular file that looks like this:

>NODE 28 length 23 cov 11.043478 ACATCCCGTTACGGTGAGCCGAAAGACCTTATGTATTTTGTGG
>NODE 32 length 21 cov 13.857142 ACAGATGTCATGAAGAGGGCATAGGCGTTATCCTTGACTGG
>NODE 33 length 28 cov 14.035714 TAGGCGTTATCCTTGACTGGGTTCCTGCCCACTTCCCGAAGGACGCAC

How can I use Unix sort to sort it by length of DNA sequence [ATCG]?

Upvotes: 4

Views: 1393

Answers (4)

Dimitre Radoulov
Dimitre Radoulov

Reputation: 28000

With Perl:

perl -e'
  print sort {
    length +($a =~ /(\S+)$/)[0] 
      <=>
    length +($b =~ /(\S+)$/)[0]
  } <>' infile

With GNU awk:

WHINY_USERS= gawk 'END { 
  for (L in l) print l[L]
  }
{ 
  l[sprintf("%15s", length($NF))] = $0 
  }' infile

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342333

 awk '{print length($NF) $0|"sort -n"}' file | sed 's/^.[^>]*>/>/'

Upvotes: 1

josephj1989
josephj1989

Reputation: 9709

This pipelined Command will figure out the length also.My Unix is a bit rusty have been doing other things for a while

$ awk '{printf("%d %s\n", length($NF), $0)}' junk.lst|sort -n -k1,1|sed 's/^[0-9]* //'

Upvotes: 3

Slartibartfast
Slartibartfast

Reputation: 1700

If the length is in the 4th column, sort -n -k4 should do the trick.

If the answer needs to figure out the length, then you're looking for a preprocessing step before sort. Perhaps python that just prints out the length of the 7th space separated column as a last or first column.

Upvotes: 6

Related Questions