Henrique Barcelos
Henrique Barcelos

Reputation: 7900

Egrep expression: how to unescape single quotes when reading from file?

I need to use egrep to obtain an entry in an index file.

In order to find the entry, I use the following command:

egrep "^$var_name" index

$var_name is the variable read from a var list file:

while read var_name; do
    egrep "^$var_name" index
done < list

One of the possible keys comes usually in this format:

$ERROR['SOME_VAR']

My index file is in the form:

$ERROR['SOME_VAR'] --> n

Where n is the line where the variable is found.

The problem is that $var_name is automatically escaped when read. When I enable the debug mode, I get the following command being executed:

+ egrep '^$ERRORS['\''SELECT_COUNTRY'\'']' index

The command above doesn't work, because egrep will try to interpret the pattern.

If I don't use the extended version, using grep or fgrep, the command will work only if I remove the ^ anchor:

grep -F "$var_name" index # this actually works

The problem is that I need to ensure that the match is made at the beginning of the line.

Ideas?

Upvotes: 1

Views: 240

Answers (2)

that other guy
that other guy

Reputation: 123470

set -x shows the command being executed in shell notation.

The backslashes you see do not become part of the argument, they're just printed by set -x to show the executed command in a copypastable format.

Your problem is not too much escaping, but too little: $ in regex means "end of line", so ^$ERROR will never match anything. Similarly, [ ] is a character range, and will not match literal square brackets.

The correct regex to match your pattern would be ^\$ERROR\['SOME VAR'], equivalent to the shell argument in egrep "^\\\$ERROR\['SOME_VAR']".

Your options to fix this are:

  1. If you expect to be able to use regex in your input file, you need to include regex escapes like above, so that your patterns are valid.

  2. If you expect to be able to use arbitrary, literal strings, use a tool that can match flexibly and literally. This requires jumping through some hoops, since UNIX tools for legacy reasons are very sloppy.

Here's one with awk:

while IFS= read -r line
do
  export line
  gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list

It passes the string in through the environment (because -v is sloppy) and then matches literally against the string from the start of the input.

Here's an example invocation:

$ cat script
while IFS= read -r line
do
  export line
  gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list

$ cat list
$ERRORS['SOME_VAR']
\E and \Q
'"'%@#%*'

$ cat index
hello world
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%@#%*' too
etc

$ bash script
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%@#%*' too

Upvotes: 1

anubhava
anubhava

Reputation: 785128

You can use printf "%q":

while read -r var_name; do
    egrep "^$(printf "%q\n" "$var_name")" index
done < list

Update: You can also do:

while read -r var_name; do
    egrep "^\Q$var_name\E" index
done < list

Here \Q and \E are used to make string in between a literal string removing all special meaning of regex symbols.

Upvotes: 0

Related Questions