Remon
Remon

Reputation: 63

Awk, gsub, ampersands and unexpected expansion

First, apologies for the potentially duplicate question. I'm new to bash scripting and I can't even figure out some keywords to search with. With that said, I tried to simplify problem description as much as I can:

I have a text file (test.txt) that contains only this line:

REPLACE

I ran the following command which is supposed to replace file's text (i.e REPLACE) with code variable value if (A & B).

code="if (A & B)" ; awk -v var="${code}" '{ gsub(/REPLACE/, var); print }' test.txt

Expected output I expect code variable value to be printed as is:

if (A & B)

Actual output somehow the ampersand is expanded into 'REPLACE', which is gsub regexp parameter:

if (A REPLACE B)

Perhaps I need to escape the ampersand but unfortunately, code variable population is out of my control, so I can't manipulate its value manually.

FYI awk version is "GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)"

Thanks!

Upvotes: 6

Views: 2017

Answers (3)

izissise
izissise

Reputation: 943

I had the same problem today, with the help of the above responses, I made the following awk function

function sanesub(rex, val) { 
   p=match($0, rex)
   if (p != 0) {
      $0=substr($0,1,p-1) val substr($0,p+RLENGTH)
   }
   return p
}

Upvotes: 0

Mervstar
Mervstar

Reputation: 11

You can just double escape the '&' character so your code would be

code="if (A \\\& B)" ; awk -v var="${code}" '{ gsub(/REPLACE/, var); print }' test.txt ​

Output:
# code="if (A \\\& B)" ; awk -v var="${code}" '{ gsub(/REPLACE/, var); print }' test.txt
if (A & B)
#

Note that in the above example you'll need to escape both the '\' and '&' characters which is why it's '\\\&'

If you didn't want to need to manipulate your input strings manually like the above example, then you could use an additional 'gsub' in your awk code to preprocess the input string to add the escape characters before running your 'gsub') as follows

code="if (A & B)" ; awk -v var="${code}" '{ gsub("&","\\\\&", var); gsub(/REPLACE/, var); print }' test.txt

Output:
​​# code="if (A & B)" ; awk -v var="${code}" '{ gsub("&","\\\\&", var); gsub(/REPLACE/, var); print }' test.txt
​if (A & B)
​#

Note the need for 4 '\' characters in the preprocessing gsub.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203109

& is a backreference metacharacter in many tools and it means "the string that matched the regexp you searched for". If you're trying to use literal strings then use literal strings instead of regexps and backreferences.

e.g.:

code="if (A & B)"
awk -v old="REPLACE" -v new="$code" 's=index($0,old){$0=substr($0,1,s-1) new substr($0,s+length(old))} 1' test.txt

The alternative, trying to santize regexps and replacements, is complicated and error prone and generally is not for the faint of heart, see: Is it possible to escape regex metacharacters reliably with sed

Upvotes: 8

Related Questions