CJRook43
CJRook43

Reputation: 13

Searching for a pattern that includes square brackets with awk

I'm trying to match a pattern with awk that contains square brackets. The pattern I am trying to match is:

[senderProcess:$PROCESS_ID:val:$ID]

where PROCESS_ID and ID are existing shell variables. I have tried defining a pattern variable in my awk statement:

awk -v pattern="[senderProcess:$PROCESS_ID:val:$ID]" '$0 ~ pattern && /GCLInbox run FINE/' $innerfile

When I run this, I get the following error:

awk: cmd. line:1: (FILENAME=logset1/teach-node-06.40490.log FNR=1) fatal: invalid regexp: Invalid range end: /[senderProcess:teach-node-06:40190:val:67]/

I took this as the awk shell interpreting the square brackets as regex special characters, so I tried escaping the brackets:

... pattern="\[senderProcess...$ID\]" ...

This gives the same error, in addition to the following two errors:

awk: warning: escape sequence `\[' treated as plain `['
awk: warning: escape sequence `\]' treated as plain `]'

I have also tried double escaping the brackets, with the same result.

I have tried using single quotes instead of double when declaring pattern, however I get the same errors, and regardless, my shell variables need to be expanded which would not happen here.

I just want to match the given pattern including its square brackets, whether that be by bypassing the regex special characters or some other way. Any help very much appreciated.

Upvotes: 1

Views: 614

Answers (4)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2895

somehow when i directly set it as FS, without escaping, it worked, but only because awk just treated it as a single character class, and thus all letters inside are valid :

echo '[senderProcess:$PROCESS_ID:val:$ID]' | 
mawk -v FS='[senderProcess:$PROCESS_ID:val:$ID]' '!_<NF'
gawk -v FS='[senderProcess:$PROCESS_ID:val:$ID]' '!_<NF'
nawk -v FS='[senderProcess:$PROCESS_ID:val:$ID]' '!_<NF'
[senderProcess:$PROCESS_ID:val:$ID]

to do it properly :

gawk 'index($-_,__)' __='[senderProcess:$PROCESS_ID:val:$ID]'

[senderProcess:$PROCESS_ID:val:$ID]

to escape everything you need :

 mawk  -v __='[senderProcess:$PROCESS_ID:val:$ID]' '
 BEGIN {   _=__
   gsub("[[-_!-/:-@{-~]", "[&]",_) 
   gsub("["\\^/]",      "\\\\&",_)

   printf("%s original pattern :\f%s\n after escaping :\f%s%s",
                             ORS = "\n\n",__,_,ORS) > ("/dev/stderr") 
_*=__=_ } ($_)~__'    
 original pattern :
                   [senderProcess:$PROCESS_ID:val:$ID]
 after escaping :
                 [[]senderProcess[:][$]PROCESS[_]ID[:]val[:][$]ID[]]

[senderProcess:$PROCESS_ID:val:$ID

Upvotes: 0

Daweo
Daweo

Reputation: 36700

I have also tried double escaping the brackets, with the same result.

You were close, you might get desired result by using \\\, consider following simple example let file.txt content be

[]
[1]
[12]

then

awk -v pattern="\\\[.\\\]" '$0 ~ pattern' file.txt

gives output

[1]

(tested in gawk 4.2.1)

Upvotes: 0

David C. Rankin
David C. Rankin

Reputation: 84609

Creating the dynamic REGEX, you can include '[' and ']' within a list [...] and have each identified as the character instead of the start/end of a list.

I would try something similar to:

awk -v pattern="[[]senderProcess:$PROCESS_ID:val:$ID[]]" 'pattern && /GCLInbox run FINE/' $innerfile

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133710

You should make use of index function of awk try following code. Setting some test values to shell variables named ID and PROCESS_ID though its advised to have shell variable names in small case just going with your samples here. Then create a shell variable named var which is having concatenation of above mentioned 2 shell variables and then passing var to awk program.

ID="test1"
PROCESS_ID="test"
var="[senderProcess:${PROCESS_ID}:val:${ID}]"
awk -v pattern="$var" 'index($0,pattern) && /GCLInbox run FINE/' Input_file

Upvotes: 1

Related Questions