Reputation: 167
I have the following file with tabs as the field separators:
header1 header2 header3 header4 header5
1field1 1field2 1field3 1field4 1field5
2field1 2field2 2field3 2field4 2field5
3field1 3field2 3field3 3field4 3field5
4field1 4field2 4field3 4field4 4field5
and would like to output each line to a new file (skipping the first line). Each new file will be named from the 1st and 5th fields with an underscore separator. The file from line 1 (2 technically) would be named "1field1_1field5.txt" and contain all the fields from that line and so on. I have the following awk command which outputs the correct filenames to standard out
awk -v FS='\t' -v OFS='_' 'NR>1 {print ($1,$5 ".txt") }'
but when I try to output the text into filenames instead
awk -v FS='\t' -v OFS='_' 'NR>1 {print > ($1,$5 ".txt") }'
I get the following error
awk: cmd. line:1: NR>1 { print > ($1,$5 ".txt") }
awk: cmd. line:1: ^ syntax error
I have copied/pasted from 10 different other articles to get this far, but I'm stuck on how my formatting is wrong.
Upvotes: 3
Views: 77
Reputation: 36700
In this case
awk -v FS='\t' -v OFS='_' 'NR>1 {print ($1,$5 ".txt") }'
you are calling print
function with 2 arguments, $1
and concatenation of $5
and .txt
. Whilst in
awk -v FS='\t' -v OFS='_' 'NR>1 {print > ($1,$5 ".txt") }'
there is not function for which to ram arguments. You might use sprintf
string function which does format and return string following way
awk -v FS='\t' -v OFS='_' 'NR>1 {print > sprintf("%s%s%s.txt",$1,OFS,$5) }'
Upvotes: 3
Reputation: 204381
Using any awk, you should do the following if your $1 and $5 fields are unique per row:
awk -F '\t' 'NR>1 { out=$1 "_" $5 ".txt"; print > out; close(out) }'
and this otherwise:
awk -F '\t' 'NR>1 { out=$1 "_" $5 ".txt"; if (!seen[out]++) printf "" > out; print >> out; close(out) }'
The close()
is so you don't end up with a "too many open files" error if your input is large. The printf "" > out
is to empty/init the output file in case it already existed before your script ran.
With GNU awk you could get away without the close()
:
awk -F '\t' 'NR>1 { print > ($1 "_" $5 ".txt") }'
but the script will slow down significantly for large input as it tries to internally handle opening/closing all of the output files as-needed.
Upvotes: 4
Reputation: 782158
The expression ($1,$5 ".txt")
is not valid.
You may be thinking that the comma operator concatenates its arguments using OFS
as the separator. But that's not how OFS
works. It's used as the separator when you give multiple arguments to the print
command, but isn't used in expressions.
In expressions, the only concatenation operator is putting sub-expressions next to each other. If you want to concatenate with OFS
you must write it explicitly.
awk -v FS='\t' -v OFS='_' 'NR>1 {print > ($1 OFS $5 ".txt") }'
You could also just write the literal instead of using OFS
.
awk -v FS='\t' -v OFS='_' 'NR>1 {print > ($1 "_" $5 ".txt") }'
Upvotes: 4