Reputation: 153
I have a .txt file like this:
col1 col2 col3 col4
1 3 4 A
2 4 6 B
3 1 5 D
5 3 7 F
I want to extract every single column (i) after column 1 and output column1 and column i into a new file named by the header of column i.
That means that I will have three output files named "col2.reform.txt", "col3.reform.txt" and "col4.reform.txt" respectively.
For example, the output "col2.reform.txt" file will look like this:
col1 col2
1 3
2 4
3 1
5 3
I tried my code like this:
awk '{for (i=1; i <=NF; i++) print $1"\t"$i > ("{awk 'NR==1' $i}"".reform.txt")}' inputfile
And apparently the "{awk 'NR==1' $i}" part does not work, and I got a file named {awk 'NR==1' $i}.reform.txt.
How can I get the file name correctly? Thanks!
PS: how can I deleted the file "{awk 'NR==1' $i}.reform.txt" in the terminal?
Edited: The above column name is just an example. I would prefer to use commands that extract the header of the column name, as my file in reality uses different words as the header.
Upvotes: 2
Views: 651
Reputation: 203229
$ awk '
NR==1 { split($0,hdrs) }
{
for (i=2; i<=NF; i++) {
out = hdrs[i]".reform.txt"
if (FNR==1) {
printf "" " > " out # to erase exiting file contents if any
}
print $1, $i " >> " out
close(out)
}
}
' file
> col2.reform.txtcol1 col2 >> col2.reform.txt
> col3.reform.txtcol1 col3 >> col3.reform.txt
> col4.reform.txtcol1 col4 >> col4.reform.txt
1 3 >> col2.reform.txt
1 4 >> col3.reform.txt
1 A >> col4.reform.txt
2 4 >> col2.reform.txt
2 6 >> col3.reform.txt
2 B >> col4.reform.txt
3 1 >> col2.reform.txt
3 5 >> col3.reform.txt
3 D >> col4.reform.txt
5 3 >> col2.reform.txt
5 7 >> col3.reform.txt
5 F >> col4.reform.txt
Just change " > "
to >
and " >> "
to >>
when you're done testing and want to actually generate the output files.
Upvotes: 2
Reputation: 133458
Based on your shown samples, could you please try following. Written with shown samples in GNU awk
.
awk '
FNR==1{
for(i=1;i<=NF;i++){
heading[i]=$i
}
next
}
{
for(i=2;i<=NF;i++){
close(outFile)
outFile="col"i".reform.txt"
if(!indVal[i]++){ print heading[1],heading[i] > (outFile) }
print $1,$i >> (outFile)
}
}
' Input_file
Output files will be created with names eg--> col2.reform.txt
, col3.reform.txt
, col4.reform.txt
and so on...
sample of col2.reform.txt
content will be as follows:
cat col2.reform.txt
col1 col2
1 3
2 4
3 1
5 3
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==1{ ##Checking condition if this is first line then do following.
for(i=1;i<=NF;i++){ ##Traversing through all fields of current line.
heading[i]=$i ##Creating heading array with index of i and value of current field.
}
next ##next will skip all further statements from here.
}
{
for(i=2;i<=NF;i++){ ##Traversing from 2nd field to till last field of all rest of lines.
close(outFile) ##Closing outFile to avoid too many opened files error.
outFile="col"i".reform.txt" ##Creating outFile which has output file name in it.
if(!indVal[i]++){ print heading[1],heading[i] > (outFile) }
##Checking condition if i is NOT present in indVal then print 1st element of heading and current element of heading into outFile.
print $1,$i >> (outFile) ##Printing 1st field and current field values to output file here.
}
}
' Input_file ##Mentioning Input_file name here.
Upvotes: 5
Reputation: 67467
here's a similar one...
$ awk 'NR==1 {n=split($0,h)}
{for(i=2;i<=n;i++) print $1,$i > (h[i]".reform.txt")}' file
==> col2.reform.txt <==
col1 col2
1 3
2 4
3 1
5 3
==> col3.reform.txt <==
col1 col3
1 4
2 6
3 5
5 7
==> col4.reform.txt <==
col1 col4
1 A
2 B
3 D
5 F
Upvotes: 2