Reputation: 2115
How to regex on the dynamic input which may have brackets
in it. Here, I am supplying input via the bash command line. This input is coming from some other program that sometimes contains brackets in it and then my simple good old $0 ~ var
construct is failing.
Here is my input data:
hello there
this is monk
and this is a random data
piano (sense) is cool
which makes no (sense) to anyone
Command-1: worked, without brackets around the var. Eg: sense
awk -v var='sense' '$0 ~ var {print "worked"}' input
worked
Command-2: worked, when I used .
(dot) in place of brackets (
and )
.
awk -v var='no .sense.' '$0 ~ var{print "worked"}' input
worked
Command-3: Here I need to supply input with brackets (
and )
. Things go crazy and I get no results. awk
silently failed by giving a false negative.
awk -v var='no (sense)' '$0 ~ var {print "worked"}' input
I have already tried $0 ~ var
and match($0, var)
they both exhibits the same behavior. I have also tried, the following but it failed miserably. Although the input var is dynamic I cannot do manual escaping as it is coming from some other program.
awk -v var='no \(sense\)' 'match($0,var){print "worked"}' input
awk: warning: escape sequence `\(' treated as plain `('
awk: warning: escape sequence `\)' treated as plain `)'
Question is, How to supply an input variable that may contain brackets to awk and awk should be able to do sane regex operation on it. Is it just impossible to do?
TLDR:
when working with the above sample input data, when var
is no (sense)
, it should ONLY return which makes no (sense) to anyone
Upvotes: 1
Views: 228
Reputation: 785471
Better to ditch regex and use plain string search using index
function:
awk -v var='no (sense)' 'index($0, var) {print "worked"; exit}' file
worked
btw if you want to escape then use \\
to escape special characters like this:
awk -v var='(^|[[:blank:]])no \\(sense\\)([[:blank:]]|$)' '
$0 ~ var {print "worked"; exit}' file
However if you must use regex and you cannot pre-escape content of var
then you can escape all special characters in the BEGIN
block like this:
awk -v var='no (sense)' '
BEGIN {
gsub(/[^_[:alnum:] ]/, "\\\\&", var)
var = "(^|[[:blank:]])" var "([[:blank:]]|$)"
}
$0 ~ var {print "worked"; exit}
' file
worked
Upvotes: 1
Reputation: 2855
INPUT
hello there
this is monk
and this is a random data
which makes no (sense) to anyone
CODE
{m,n,g}awk -v __='no (sense)' '
BEGIN {
gsub("[[-\140!-/\\]{-~:-@]",
"[&]", __)
gsub(/[\\^]/, "\\\\&",__)
OFS = "worked"
FS = "^.*[^[:alpha:]]?"(__)".*$" } NF*=!_<NF'
OUTPUT
worked
To give a sense what those 2 gsub()
does to ASCII
:
anything from "!" to "~" that isn't alphanumeric gets
safely "caged" in square brackets,
regardless of whether it's considered metacharacter or not,
which differs among awk flavors.
=
[!] ["] [#] [$] [%] [&] ['] [(]
[)] [*] [+] [,] [-] [.] [/] 0
1 2 3 4 5 6 7 8
9 [:] [;] [<] [=] [>] [?] [@]
A B C D E F G H
I J K L M N O P
Q R S T U V W X
Y Z [[] [\\] []] [\^] [_] [`]
a b c d e f g h
i j k l m n o p
q r s t u v w x
y z [{] [|] [}] [~]
Upvotes: 0
Reputation: 195169
Alternative to escape those characters having special meanings in ERE, you can consider using character class:
$ awk -v var='no [(]sense[)]' '$0 ~ var {print "worked"}' file
worked
IMO, []
could be easier to read than escapes in some cases.
Upvotes: 0