Reputation: 606
I have rows like so:
rs6605071 chr1:962943 XM_017002478.2 stuff1,stuff2 morestuff
rs6605071 chr1:962943 XM_017002479.1 stuff1,stuff2,stuff3,stuff4,stuff5 morestuff
rs6605071 chr1:962943 XR_001737138.1 stuff1,stuff2,stuff3 morestuff
rs6605071 chr1:962943 XR_001737478.1 stuff1,stuff2,stuff3,stuff4 morestuff
rs6605071 chr1:962943 NC_426604.3 stuff1 morestuff
rs6605071 chr1:962943 NC_426605.3 stuff1 morestuff
I would like to sort my rows by the 4th column for the desired output:
rs6605071 chr1:962943 XM_017002479.1 stuff1,stuff2,stuff3,stuff4,stuff5 morestuff
rs6605071 chr1:962943 XR_001737478.1 stuff1,stuff2,stuff3,stuff4 morestuff
rs6605071 chr1:962943 XM_017002478.2 stuff1,stuff2 morestuff
rs6605071 chr1:962943 NC_426604.3 stuff1 morestuff
rs6605071 chr1:962943 NC_426605.3 stuff1 morestuff
What is the best approach to achieve such result in bash ?
Edit 1: The column 4 shouldn't be sorted alphabetically. It has to be sorted according to the number of values found (delimited by commas).
Thank you in advance
Upvotes: 0
Views: 52
Reputation: 1695
So this is a bit hacky, but it works. I can't tell your delimeter (if it's tabs or spaces), but something like this will work, and allows for fairly easily manipulation:
cat asdfasdf.txt | awk '{print gsub(/,/,","),$1,$2,$3,$4,$5}' | sort -r | cut -d' ' -f2,3,4,5,6
Now, there has got to be a way to do this entirely in awk
, and I'm always in awe of the awk
experts who know it so well.
I hope one of them puts together a more elegant command, but for now, this will help in a pinch.
Upvotes: 1