Reputation: 7900
I need to use egrep
to obtain an entry in an index file.
In order to find the entry, I use the following command:
egrep "^$var_name" index
$var_name
is the variable read from a var list file:
while read var_name; do
egrep "^$var_name" index
done < list
One of the possible keys comes usually in this format:
$ERROR['SOME_VAR']
My index file is in the form:
$ERROR['SOME_VAR'] --> n
Where n
is the line where the variable is found.
The problem is that $var_name
is automatically escaped when read. When I enable the debug mode, I get the following command being executed:
+ egrep '^$ERRORS['\''SELECT_COUNTRY'\'']' index
The command above doesn't work, because egrep
will try to interpret the pattern.
If I don't use the extended version, using grep
or fgrep
, the command will work only if I remove the ^
anchor:
grep -F "$var_name" index # this actually works
The problem is that I need to ensure that the match is made at the beginning of the line.
Ideas?
Upvotes: 1
Views: 240
Reputation: 123470
set -x
shows the command being executed in shell notation.
The backslashes you see do not become part of the argument, they're just printed by set -x
to show the executed command in a copypastable format.
Your problem is not too much escaping, but too little: $
in regex means "end of line", so ^$ERROR
will never match anything. Similarly, [
]
is a character range, and will not match literal square brackets.
The correct regex to match your pattern would be ^\$ERROR\['SOME VAR']
, equivalent to the shell argument in egrep "^\\\$ERROR\['SOME_VAR']"
.
Your options to fix this are:
If you expect to be able to use regex in your input file, you need to include regex escapes like above, so that your patterns are valid.
If you expect to be able to use arbitrary, literal strings, use a tool that can match flexibly and literally. This requires jumping through some hoops, since UNIX tools for legacy reasons are very sloppy.
Here's one with awk:
while IFS= read -r line
do
export line
gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list
It passes the string in through the environment (because -v
is sloppy) and then matches literally against the string from the start of the input.
Here's an example invocation:
$ cat script
while IFS= read -r line
do
export line
gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list
$ cat list
$ERRORS['SOME_VAR']
\E and \Q
'"'%@#%*'
$ cat index
hello world
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%@#%*' too
etc
$ bash script
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%@#%*' too
Upvotes: 1
Reputation: 785128
You can use printf "%q"
:
while read -r var_name; do
egrep "^$(printf "%q\n" "$var_name")" index
done < list
Update: You can also do:
while read -r var_name; do
egrep "^\Q$var_name\E" index
done < list
Here \Q
and \E
are used to make string in between a literal string removing all special meaning of regex symbols.
Upvotes: 0