Reputation: 65
I have looked around on stackoverflow for a solution, I have found some related answers, but I haven't been able to find a clear solution to my doubt. I hope I am not asking a duplicate question.
Let us consider a file
cat > file << EOF
1 2 3 4, 5,, 6, 7
EOF
I want to use as separator an arbitrary number of commas and spaces. With awk, setting the field separator with F"[ ,]*", I obtain the desired result, i.e:
awk -F"[ ,]+" '{print $1}' file --> 1
awk -F"[ ,]+" '{print $2}' file --> 2
awk -F"[ ,]+" '{print $3}' file --> 3
awk -F"[ ,]+" '{print $4}' file --> 4
awk -F"[ ,]+" '{print $5}' file --> 5
awk -F"[ ,]+" '{print $6}' file --> 6
awk -F"[ ,]+" '{print $7}' file --> 7
However, if I have leading spaces I have a problem. For example:
with one leading space
cat > file << EOF
1 2 3 4, 5,, 6, 7
EOF
I obtain
awk -F"[ ,]+" '{print $1}' file -->
awk -F"[ ,]+" '{print $2}' file --> 1
awk -F"[ ,]+" '{print $3}' file --> 2
...
with two leading spaces the same
cat > file << EOF
1 2 3 4, 5,, 6, 7
EOF
awk -F"[ ,]+" '{print $1}' file -->
awk -F"[ ,]+" '{print $2}' file --> 1
awk -F"[ ,]+" '{print $3}' file --> 2
...
and so forth.
However, the problem is not only with the spaces. For example with
cat > file << EOF
1,2,3,
EOF
I have
awk -F"," '{print $1}' file --> 1
awk -F"," '{print $2}' file --> 2
awk -F"," '{print $3}' file --> 3
awk -F"," '{print $4}' file -->
which is what I expect, but with
cat > file << EOF
,1,2,3
EOF
I get
awk -F"," '{print $1}' file -->
awk -F"," '{print $2}' file --> 1
awk -F"," '{print $3}' file --> 2
awk -F"," '{print $4}' file --> 3
and I do not understand why.
It seems that awk is treating the leading separators in a different way. Probably, I have misunderstood the regex syntax. Indeed, I do not understand why setting -F" " the leading spaces are treated in the proper way, whereas setting -F"[ ]*" I have the same problem.
In conclusion, these are my questions: why I am obtaining those results for leading spaces or leading commas, and what is the correct syntax to consider, as field separators, any number of commas and spaces, regardless if they are leading or not.
Upvotes: 1
Views: 267
Reputation: 67497
Yes, there is some inconsistency, I guess for convenience.
The default delimiter ignores the leading/trailing spaces
$ echo " 1 2 " | awk '{for(i=1;i<=NF;i++) print i"--> "$i}'
1--> 1
2--> 2
setting FS
to space behaves the same
$ echo " 1 2 " | awk -F' ' '{for(i=1;i<=NF;i++) print i"--> "$i}'
1--> 1
2--> 2
however, the char set.
$ echo " 1 2 " | awk -F'[ ]' '{for(i=1;i<=NF;i++) print i"--> "$i}'
1-->
2--> 1
3--> 2
4-->
since there are 3 delimiters (so assumes four fields). With the comma delimiter you won't get the default behavior, but just the last version.
If you want to mimic the default behavior for both comma and space, you need to write your own handling, something like this
$ echo " ,1 2," |
awk -F'[ ,]+' 'NF{if($1=="") {for(i=2;i<=NF;i++) $(i-1)=$i; NF--}
if($NF=="") NF--}
{for(i=1;i<=NF;i++) print i"--> "$i}'
1--> 1
2--> 2
Explanation: if the first field is empty, shift all the fields to one left, reduce the field count by one; similarly if the last field is empty, simply reduce the field count. The last statement is for printing the fields one at a line by field position number.
Update
to handle empty lines add a guard NF
before attempting to fix the fields.
Upvotes: 1