LookIntoEast
LookIntoEast

Reputation: 8798

Use bash commands to sort list according to the certain column

I have a list of data with four column like below:

chr1    9778939 10199603    DEL
chr1    143804138   143808614   DEL
chr1    8541961 8757598 DEL
chr1    141480516   141909199   INV
chr1    3902285 4665319 INV
chr1    10212548    10467934    DEL
chr1    225767517   226730696   INV
chr1    10807309    11011343    DEL
chr1    23663773    23957334    DEL
chr1    4468523 4665322 DEL
chr1    24458662    24704306    DEL
....
....
chr2
....
....
chr10
....
....
chr22
....
....
chrX
....
....
chrY
....
....

I hope to:

  1. first sort according to chr1, chr2, chr3.....till chr22,chrX,chrY. If simply use sort -n, it'll sort as chr10, chr1, chr11....blabla. I hope to sort according to the numeric value of the fist column.

  2. Then under each chromosome(chr1,chr2...) how can I sort according to the last column, that is "DEL" or "INV"?

  3. Then sort according to the second column,again, the numeric value. Say 104000 should go after 10500 because 104000 > 10500, but not based on the third digit comparison(4 and 5)

Thanks Hope I've made it clear.

Upvotes: 1

Views: 4398

Answers (2)

ArthurG
ArthurG

Reputation: 71

Convert X and Y to 23 and 24 to sort numerically, and then back after the sort.

cat file | sed 's/chr/chr /' | sed 's/ X/ 23/' | sed 's/ Y/ 24/' | sort -k 2,2n -k 5,5n -k 3,3n | sed 's/chr 23/chrX/' | sed 's/chr 24/chrY/' | sed 's/chr /chr/'

It's a long string of seds, but they run quickly.

Upvotes: 0

Raihan
Raihan

Reputation: 10395

Assuming the columns in the file afile are seprated by a single space character

$ cat afile | sed 's/chr/chr /' | sort -k2,2n -k5,5 -k3,3n | sed 's/chr /chr/'

Upvotes: 2

Related Questions