Reputation: 1
I'd like to pass a couple shell variables to an awk command that then uses regex to match them on a field. However, I want the variables' contents to be treated as literals within the regex. All of this done for each line of an input file.
So this
123^A
Would be found in any of these
123^A|field2|field3
123^A~000^A|field2|field3
000^A~123^A|field2|field3
000^A~123^A~999^A|field2|field3
But not any of these
123^B|field2|field3
1234^A|field2|field3
123|field2|field3
123~000|field2|field3
Example that isn't working:
read inputfile?'Enter the input file: '
read tackedonvalue?'Enter the value to tack onto each input value: '
read searchfile?'Enter the search file: '
read fieldnum?'Enter the field number to search: '
read delim?'Enter the field delimiter: '
while read -r SEARCHTERM
do awk -F"${delim}" -v a="(^|~)${SEARCHTERM}${tackedonvalue}(~|$)" -v COL="${fieldnum}" '$COL ~ /a/' ${searchfile} >> output_file.txt
done < ${inputfile}
$inputfile
variable from the input$tackedonvalue
variable from the inputWhat makes this example not work is that the $tackedonvalue
variable will often have ^
characters in it, which would then need to be escaped for the regex. (Escaping them manually in the input isn't an option.) There may also be other special characters entered in that variable that would also need to be escaped, and so I don't want to have to find/replace on every special character for every case.
Another example that I tried first but couldn't get to work (same input prompts and while read
as before):
awk -F"${delim}" -v a="${SEARCHTERM}" -v b="${tackedonvalue}" -v COL="$fieldnum" '$COL ~ ("(^|~)" a b "(~|$)")' ${searchfile} >> output_file.txt
I think this didn't work because of the starting and ending anchors, but I could not figure out how to fix those and so had to use the regex constant (/pattern/ with the forward slashes).
If the anchors could be fixed for this second example AND the variable contents would be treated as literals, then this would be another route.
P.S. - First post so let me know what to change/improve/provide.
Upvotes: 0
Views: 1078
Reputation: 781706
You need to escape the ^
in the search term because it has special meaning in regular expressions.
SEARCHTERM=${SEARCHTERM//^/\\^}
If your search term might include other characters that have special meaning in regular expressions, you'll need to replace them all. This would be easier done in awk itself:
awk -v -F"$delim" search="$SEARCHTERM" -v tacked="$tackedonvalue" -v col="$fieldnum" '
BEGIN {gsub(/[]*+^$\\]/, "\\\\&", search); pattern = "(^|~)" search tacked "(~|$)" }
$col ~ pattern' "$searchfile"
BTW, you shouldn't use all-uppercase shell variables. The convention is that these names are reserved for environment variables.
But maybe you shouldn't use a pattern match at all. I think you could just split the field on the ~
character, then loop over that array testing if any of the elements match the search string.
split($col, array, "~");
for (i in array) if (array[i] == (search tacked)) { print; break }
Upvotes: 1
Reputation: 52529
I don't know about awk, but this is easy to do with perl:
$ cat a.txt
123^A|field2|field3
123^A~000^A|field2|field3
000^A~123^A|field2|field3
000^A~123^A~999^A|field2|field3
123^B|field2|field3
1234^A|field2|field3
123|field2|field3
123~000|field2|field3
$ export PAT=123^A
$ export FIELDNUM=0
$ perl -F'\|' -le "print if \$F[${FIELDNUM}] =~ /(^|~)\Q${PAT}\E(~|$)/" a.txt
123^A|field2|field3
123^A~000^A|field2|field3
000^A~123^A|field2|field3
000^A~123^A~999^A|field2|field3
Anything between \Q
and \E
in a regular expression has any metacharacters automatically escaped/ignored.
Upvotes: 0