Reputation: 8798
I have a list of data with four column like below:
chr1 9778939 10199603 DEL
chr1 143804138 143808614 DEL
chr1 8541961 8757598 DEL
chr1 141480516 141909199 INV
chr1 3902285 4665319 INV
chr1 10212548 10467934 DEL
chr1 225767517 226730696 INV
chr1 10807309 11011343 DEL
chr1 23663773 23957334 DEL
chr1 4468523 4665322 DEL
chr1 24458662 24704306 DEL
....
....
chr2
....
....
chr10
....
....
chr22
....
....
chrX
....
....
chrY
....
....
I hope to:
first sort according to chr1, chr2, chr3.....till chr22,chrX,chrY. If simply use sort -n
, it'll sort as chr10, chr1, chr11....blabla. I hope to sort according to the numeric value of the fist column.
Then under each chromosome(chr1,chr2...) how can I sort according to the last column, that is "DEL" or "INV"?
Then sort according to the second column,again, the numeric value. Say 104000 should go after 10500 because 104000 > 10500, but not based on the third digit comparison(4 and 5)
Thanks Hope I've made it clear.
Upvotes: 1
Views: 4398
Reputation: 71
Convert X and Y to 23 and 24 to sort numerically, and then back after the sort.
cat file | sed 's/chr/chr /' | sed 's/ X/ 23/' | sed 's/ Y/ 24/' | sort -k 2,2n -k 5,5n -k 3,3n | sed 's/chr 23/chrX/' | sed 's/chr 24/chrY/' | sed 's/chr /chr/'
It's a long string of seds, but they run quickly.
Upvotes: 0
Reputation: 10395
Assuming the columns in the file afile
are seprated by a single space
character
$ cat afile | sed 's/chr/chr /' | sort -k2,2n -k5,5 -k3,3n | sed 's/chr /chr/'
Upvotes: 2