Reputation: 2593
I have large scale tab-delimited files (a couple of hundreds), and but the order of the columns is different across the different files( the same columns, but different locations). Hence, I need to reorder all the columns in all the files and write it back in tab-deliminated format. I would like to write a shell script that takes a specified order of columns and reorder all the columns in all the files and write it back. Can someone help me with it?
Here is how the header of my files looks like: file1)
sLS72 chrX
A B E C F H
2 1 4 5 7 8
0 0 0 0 0 0
and the header of my second file:
S721 chrX
A E B F H C
12 11 2 3 4 1
0 0 0 0 0 0
here is the order of the columns that I want to achieve:
Order=[A ,B ,C ,E,F,H]
and here is the expected outputs for each file based on this ordering:
sLS72 chrX
A B C E F H
2 1 5 4 7 8
0 0 0 0 0 0
file 2:
S721 chrX
A B C E F H
12 2 1 11 3 4
0 0 0 0 0 0
I was trying to use awk:
awk -F'\t' '{s2=$A; $3=$B; $4=$C; $5=$E; $1=s}1' OFS='\t' in file
but the point is the, first, the order of columns are different in different files, and second, the names of the columns start from the second line of the file. In order words, first line is the header, I don't want to change it, but the second line is the colnames of the columns, so I want to order all files based on that. it's kind of tricky
Upvotes: 0
Views: 144
Reputation: 67507
$ awk -v order="A B C E F H" '
BEGIN {n=split(order,ho)}
FNR==1 {print; next}
FNR==2 {for(i=1;i<=NF;i++) hn[$i]=i}
{for(i=1;i<=n;i++) printf "%s",$hn[ho[i]] (i==n?ORS:OFS)}' file1 > tmp && mv tmp file1
sLS72 chrX
A B C E F H
0 0 0 0 0 0
0 0 0 0 0 0
if working on multiple files at the same time, change it to
$ awk -v ...
{... printf "%s",$hn[ho[i]] (i==n?ORS:OFS) > (FILENAME"_reordered") }' dir/files*
and do a mass rename afterwards. Alternative is run the original script if a loop for each file.
Upvotes: 3