zzz
zzz

Reputation: 153

awk extract a column and output a file named by the column header

I have a .txt file like this:

col1    col2    col3    col4
1   3   4   A
2   4   6   B
3   1   5   D
5   3   7   F

I want to extract every single column (i) after column 1 and output column1 and column i into a new file named by the header of column i.

That means that I will have three output files named "col2.reform.txt", "col3.reform.txt" and "col4.reform.txt" respectively.

For example, the output "col2.reform.txt" file will look like this:

col1    col2
1   3
2   4
3   1
5   3

I tried my code like this:

awk '{for (i=1; i <=NF; i++) print $1"\t"$i > ("{awk 'NR==1' $i}"".reform.txt")}' inputfile

And apparently the "{awk 'NR==1' $i}" part does not work, and I got a file named {awk 'NR==1' $i}.reform.txt.

How can I get the file name correctly? Thanks!

PS: how can I deleted the file "{awk 'NR==1' $i}.reform.txt" in the terminal?

Edited: The above column name is just an example. I would prefer to use commands that extract the header of the column name, as my file in reality uses different words as the header.

Upvotes: 2

Views: 651

Answers (3)

Ed Morton
Ed Morton

Reputation: 203229

$ awk '
    NR==1 { split($0,hdrs) }
    {
        for (i=2; i<=NF; i++) {
            out = hdrs[i]".reform.txt"
            if (FNR==1) {
                printf "" " > " out    # to erase exiting file contents if any
            }
            print $1, $i " >> " out
            close(out)
        }
    }
' file
 > col2.reform.txtcol1 col2 >> col2.reform.txt
 > col3.reform.txtcol1 col3 >> col3.reform.txt
 > col4.reform.txtcol1 col4 >> col4.reform.txt
1 3 >> col2.reform.txt
1 4 >> col3.reform.txt
1 A >> col4.reform.txt
2 4 >> col2.reform.txt
2 6 >> col3.reform.txt
2 B >> col4.reform.txt
3 1 >> col2.reform.txt
3 5 >> col3.reform.txt
3 D >> col4.reform.txt
5 3 >> col2.reform.txt
5 7 >> col3.reform.txt
5 F >> col4.reform.txt

Just change " > " to > and " >> " to >> when you're done testing and want to actually generate the output files.

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133458

Based on your shown samples, could you please try following. Written with shown samples in GNU awk.

awk '
FNR==1{
  for(i=1;i<=NF;i++){
    heading[i]=$i
  }
  next
}
{
  for(i=2;i<=NF;i++){
    close(outFile)
    outFile="col"i".reform.txt"
    if(!indVal[i]++){ print heading[1],heading[i] > (outFile) }   
    print $1,$i >> (outFile)
  }
}
' Input_file

Output files will be created with names eg--> col2.reform.txt, col3.reform.txt, col4.reform.txt and so on...

sample of col2.reform.txt content will be as follows:

cat col2.reform.txt
col1 col2
1 3
2 4
3 1
5 3

Explanation: Adding detailed explanation for above.

awk '                             ##Starting awk program from here.
FNR==1{                           ##Checking condition if this is first line then do following.
  for(i=1;i<=NF;i++){             ##Traversing through all fields of current line.
    heading[i]=$i                 ##Creating heading array with index of i and value of current field.
  }
  next                            ##next will skip all further statements from here.
}
{
  for(i=2;i<=NF;i++){             ##Traversing from 2nd field to till last field of all rest of lines.
    close(outFile)                ##Closing outFile to avoid too many opened files error.
    outFile="col"i".reform.txt"   ##Creating outFile which has output file name in it.
    if(!indVal[i]++){ print heading[1],heading[i] > (outFile) }   
                                  ##Checking condition if i is NOT present in indVal then print 1st element of heading and current element of heading into outFile.
    print $1,$i >> (outFile)      ##Printing 1st field and current field values to output file here.
  }
}
' Input_file                      ##Mentioning Input_file name here.

Upvotes: 5

karakfa
karakfa

Reputation: 67467

here's a similar one...

$ awk 'NR==1 {n=split($0,h)} 
             {for(i=2;i<=n;i++) print $1,$i > (h[i]".reform.txt")}' file

==> col2.reform.txt <==
col1 col2
1 3
2 4
3 1
5 3

==> col3.reform.txt <==
col1 col3
1 4
2 6
3 5
5 7

==> col4.reform.txt <==
col1 col4
1 A
2 B
3 D
5 F

Upvotes: 2

Related Questions