user2806363
user2806363

Reputation: 2593

How to reorder columns of hunderds of tab deliminated file in linux?

I have large scale tab-delimited files (a couple of hundreds), and but the order of the columns is different across the different files( the same columns, but different locations). Hence, I need to reorder all the columns in all the files and write it back in tab-deliminated format. I would like to write a shell script that takes a specified order of columns and reorder all the columns in all the files and write it back. Can someone help me with it?

Here is how the header of my files looks like: file1)

sLS72   chrX
A   B   E   C   F   H
2   1   4   5   7   8
0   0   0   0   0   0

and the header of my second file:

S721    chrX
A   E   B   F   H   C
12  11  2   3   4   1
0   0   0   0   0   0

here is the order of the columns that I want to achieve:

Order=[A ,B ,C ,E,F,H]

and here is the expected outputs for each file based on this ordering:

  sLS72 chrX
    A   B   C   E   F   H
    2   1   5   4   7   8
    0   0   0   0   0   0

file 2:

S721    chrX
A   B   C   E   F   H
12  2   1   11  3   4
0   0   0   0   0   0

I was trying to use awk:

awk -F'\t' '{s2=$A; $3=$B; $4=$C; $5=$E; $1=s}1' OFS='\t' in file

but the point is the, first, the order of columns are different in different files, and second, the names of the columns start from the second line of the file. In order words, first line is the header, I don't want to change it, but the second line is the colnames of the columns, so I want to order all files based on that. it's kind of tricky

Upvotes: 0

Views: 144

Answers (1)

karakfa
karakfa

Reputation: 67507

$ awk -v order="A B C E F H" '
    BEGIN  {n=split(order,ho)} 
    FNR==1 {print; next} 
    FNR==2 {for(i=1;i<=NF;i++) hn[$i]=i} 
           {for(i=1;i<=n;i++) printf "%s",$hn[ho[i]] (i==n?ORS:OFS)}' file1 > tmp && mv tmp file1

sLS72   chrX
A B C E F H
0 0 0 0 0 0
0 0 0 0 0 0

if working on multiple files at the same time, change it to

$ awk -v ...
        {... printf "%s",$hn[ho[i]] (i==n?ORS:OFS) > (FILENAME"_reordered") }' dir/files*

and do a mass rename afterwards. Alternative is run the original script if a loop for each file.

Upvotes: 3

Related Questions