monk
monk

Reputation: 2115

awk regex escape coming as variable

How to regex on the dynamic input which may have brackets in it. Here, I am supplying input via the bash command line. This input is coming from some other program that sometimes contains brackets in it and then my simple good old $0 ~ var construct is failing.

Here is my input data:

hello there
this is monk
and this is a random data
piano (sense) is cool
which makes no (sense) to anyone

Command-1: worked, without brackets around the var. Eg: sense

awk -v var='sense' '$0 ~ var {print "worked"}' input
worked

Command-2: worked, when I used . (dot) in place of brackets ( and ).

awk -v var='no .sense.' '$0 ~ var{print "worked"}' input
worked

Command-3: Here I need to supply input with brackets ( and ). Things go crazy and I get no results. awk silently failed by giving a false negative.

awk -v var='no (sense)' '$0 ~ var {print "worked"}' input

I have already tried $0 ~ var and match($0, var) they both exhibits the same behavior. I have also tried, the following but it failed miserably. Although the input var is dynamic I cannot do manual escaping as it is coming from some other program.

awk -v var='no \(sense\)' 'match($0,var){print "worked"}' input
awk: warning: escape sequence `\(' treated as plain `('
awk: warning: escape sequence `\)' treated as plain `)'

Question is, How to supply an input variable that may contain brackets to awk and awk should be able to do sane regex operation on it. Is it just impossible to do?

TLDR:

when working with the above sample input data, when var is no (sense), it should ONLY return which makes no (sense) to anyone

Upvotes: 1

Views: 228

Answers (3)

anubhava
anubhava

Reputation: 785471

Better to ditch regex and use plain string search using index function:

awk -v var='no (sense)' 'index($0, var) {print "worked"; exit}' file

worked

btw if you want to escape then use \\ to escape special characters like this:

awk -v var='(^|[[:blank:]])no \\(sense\\)([[:blank:]]|$)' '
$0 ~ var {print "worked"; exit}' file

However if you must use regex and you cannot pre-escape content of var then you can escape all special characters in the BEGIN block like this:

awk -v var='no (sense)' '
BEGIN {
   gsub(/[^_[:alnum:] ]/, "\\\\&", var)
   var = "(^|[[:blank:]])" var "([[:blank:]]|$)"
}
$0 ~ var {print "worked"; exit}
' file

worked

Upvotes: 1

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2855

INPUT

hello there
this is monk
and this is a random data
which makes no (sense) to anyone

CODE

 {m,n,g}awk -v __='no (sense)' '
 BEGIN {
    gsub("[[-\140!-/\\]{-~:-@]",
                   "[&]",    __)
    gsub(/[\\^]/, "\\\\&",__)
     OFS = "worked"
     FS  = "^.*[^[:alpha:]]?"(__)".*$" } NF*=!_<NF'

OUTPUT

worked

To give a sense what those 2 gsub() does to ASCII :

 anything from "!" to "~" that isn't alphanumeric gets 
 safely "caged" in square brackets, 

 regardless of whether it's considered metacharacter or not, 
 which differs among awk flavors.

=

 [!]  ["]  [#]  [$]   [%]  [&]   [']  [(]
 [)]  [*]  [+]  [,]   [-]  [.]   [/]  0
 1    2    3    4     5    6     7    8
 9    [:]  [;]  [<]   [=]  [>]   [?]  [@]

 A    B    C    D     E    F     G    H
 I    J    K    L     M    N     O    P
 Q    R    S    T     U    V     W    X
 Y    Z    [[]  [\\]  []]  [\^]  [_]  [`]

 a    b    c    d     e    f     g    h
 i    j    k    l     m    n     o    p
 q    r    s    t     u    v     w    x
 y    z    [{]  [|]   [}]  [~]

Upvotes: 0

Kent
Kent

Reputation: 195169

Alternative to escape those characters having special meanings in ERE, you can consider using character class:

$ awk -v var='no [(]sense[)]' '$0 ~ var {print "worked"}' file
worked

IMO, [] could be easier to read than escapes in some cases.

Upvotes: 0

Related Questions