asachet
asachet

Reputation: 6921

single space as field separator with awk

I am dealing with a file where fields are separated by a single space.

awk interprets the FS " " as "one or more whitespace", which misreads my file when one of the fields is empty.

I tried using "a space not followed by a space"( " (?! )" ) as FS but awk does not support negative lookahead. Simple google queries like "single space field separator awk" only sent me to the manual page explaining the special treatment of FS=" ". I must have missed the relevant manual page...

How can I use a single space as field separator with awk?

Upvotes: 19

Views: 78122

Answers (2)

mwfearnley
mwfearnley

Reputation: 3649

To give a couple of helpful manpage references for this behaviour:

Default Field Splitting explains that " " is the default value, but carries a special meaning:

The default value of the field separator FS is a string containing a single space, " ".

If awk interpreted this value in the usual way, each space character would separate fields, so two spaces in a row would make an empty field between them.

The reason this does not happen is that a single space as the value of FS is a special case—it is taken to specify the default manner of delimiting fields.

Regexp Field Splitting explains how delimit a single space:

For a less trivial example of a regular expression, try using single spaces to separate fields the way single commas are used. FS can be set to "[ ]" (left bracket, space, right bracket).

This regular expression matches a single space and nothing else (see Regular Expressions).

(Added the emphasis and paragraphing.)

Upvotes: 0

karakfa
karakfa

Reputation: 67467

this should work

$ echo 'a    b' | awk -F'[ ]' '{print NF}'
5

where as, this treats all contiguous white space as one.

$ echo 'a    b' | awk -F' ' '{print NF}'
2

based on the comment, it need special consideration, empty string or white space as field value are very different things probably not a good match for a white space separated content.

I would suggest preprocessing with cut and changing the delimiters, for example

$ echo 'a    b' | cut -d' ' -f1,3,5 --output-delimiter=,
a,,b

Upvotes: 33

Related Questions