Reputation: 51
The default for white-space field separators, such as tab when using FS = "\t"
, in AWK
is either one or many. Therefore, if you want to read in a tab separated file with null values in some columns (other than the last), it skips over them. For example:
1 "\t" 2 "\t" "" "\t" 4 "\t" 5
$3
would refer to 4
, not the null ""
even though there are clearly two tabs.
What should I do so that I can specify the field separator to be one tab only, so that $4
would refer to 4
and not 5
?
Upvotes: 5
Views: 19125
Reputation: 37258
echo '1 "\t" 2 "\t" "" "\t" 4 "\t" 5' | awk -F"\t" '{print "$3="$3 , "$4="$4}'
output
$3=" "" " $4=" 4 "
So you can remove the dbl-quotes in your original string, and get
echo '1\t2\t\t4\t5' | awk -F"\t" '{print "$3="$3 , "$4="$4}'
output2
$3= $4=4
You're right, the default FS is white space, with the caveat that space and tab char next to each other, would qualify as 1 FS instance. So to use just "\t" as your FS, you can do as above as a cmd-line argument, or you can include an explict reset on FS, usually done in a BEGIN
block, like
echo '1 "\t" 2 "\t" "" "\t" 4 "\t" 5' | awk 'BEGIN{FS="\t"}{print "$3="$3 , "$4="$4}'
IHTH
Upvotes: 5