Reputation: 579
I'm trying to understand a particular line of code from a Unix talk, and can't seem to understand what the awk portion is doing.
The full line is: man ls | col -b | grep '^[[:space:]]*ls \[' | awk -F '[][]' '{print $2}'
. The text passed to awk (if for some reason you don't have the man program) is: ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]
. Somehow, awk is able to just pull out the list of options to ls, but I can't really understand how this regex [][]
actually works & what it matches for.
My best guess is that the outer brackets denote a character class whose contents contain ][
. If that's the case, why can't the inner brackets be written as []
. Is it because pairs of brackets [[]]
have a different meaning in awk?
Thanks in advance!
Upvotes: 1
Views: 112
Reputation: 15772
Your hunch about character classes is correct. If you want certain characters to be field separators, then you can list them between brackets. Using awk -F '[abc]' ...
would specify the a and b and c characters as separators. Order is irrelevant; you could use awk -F '[cab]' ...
and get the same results.
But what if you want the separating characters to be left and right brackets themselves? The documentation for regular expressions (man re_format
on many systems) says this:
To include a literal `]' in the list, make it the first character ...
Which makes sense, given how the expression will be parsed. As the parser is scanning the expression, it's looking for the end, the right bracket. It doesn't care about seeing another left bracket or a comma or a space or whatever, but a right bracket would mark the end unless there's some way to tell the parser to take it literally. Since brackets with nothing between them, []
, would be useless, a right bracket as the first character is defined to mean something else: this can't be the end, so take this right-bracket literally.
So if you want brackets as field-separating characters, you list [
and ]
between brackets, but you put the right bracket first in the list so it'll be taken literally, per the instructions: [][]
Upvotes: 0
Reputation: 785316
In POSIX regular expressions [...]
is called a bracket expression.
It is very similar to character class in other reegx flavors. One key difference is that the backslash is NOT a meta-character in a POSIX bracket expression.
If you want to include [
and ]
in a bracket expression then it needs to be placed correctly i.e. ]
right at the start and [
.
As per the linked article:
To match a
]
, put it as the first character after the opening[
or the negating^
. To match a-
, put it right before the closing]
. To match a^
, put it before the final literal-
or the closing]
.
In your example:
awk -F '[][]' '...'
awk
sets (input) field separator as single literal [
or ]
character.
Upvotes: 1
Reputation: 37414
If you had [[]]
it would mean that [
is in brackets []
, like [[]
followed by a ]
so the field separator would be []
:
$ echo a[]b | awk -F'[[]]' '{print $2}'
b
But then the brackets other way around:
$ echo a][b | awk -F'[][]' '{print $3}'
b
Now the $2
is empty and $3==b
(oh dear what done).
Upvotes: 0