ruanhao
ruanhao

Reputation: 4922

when to escape special character in shell

guys:
it is hard for me to judge when to escape special characters in shell, and which character should be escaped. for example:

sed '/[0-9]\{3\}/d' filename.txt  

like above, why we should escape { while leave [ unchanged, i think they are both special chars.
Can you help me with this?

/br
ruan

Upvotes: 1

Views: 570

Answers (4)

NeronLeVelu
NeronLeVelu

Reputation: 10039

It mainly depend on sed version (posix compliant or extended behavior) and then you need to adapt depending of the shell because, indeed, some modification occur before the sed action is received like you state. The best example is the use of simple of double quote at shell level and the \( or ( at sed level. so:

  1. define the pattern (reg ex) you want
  2. adapt for the sed version/option you are using
  3. adapt for shell interpretation

let's have fun to create the substitution sed order of \{ by &/$IFS (literal, not IFS value) using double quote surrounding sed script in BASH/KSH shell and posix or GNU sed.

Upvotes: 0

chepner
chepner

Reputation: 531718

The general answer is that you need to escape characters that have special meaning when you want to treat them as literal characters, not for their special meaning. The rules for what characters have special meaning vary from program to program.


Your specific question involves characters that have special meaning to sed; single quotes prevent any enclosed characters from being interpreted by bash.

In this case, you are escaping the { and } to prevent sed from interpreting them. First, consider this command:

sed '/[0-9]{3}/d' filename.txt

If you are using a version of sed that treats both [ and { specially, this command says to delete any line which contains a sequence of exactly 3 digits. The [0-9] is not a literal 5-character string; it's a regular expression that matches any single numeral. The {3} isn't a literal 3-character string; it's a modifier that matches exactly 3 of the preceding regular expression. Lines like the following will be matched:

593
3296

but not

34a7

because there aren't 3 digits in a row.

Now, consider your command:

sed '/[0-9]\{3\}/d' filename.txt

The [0-9] is still a regular expression that matches a single numeral. But now, you have escaped the braces. Instead of being a modifier for the preceding regular expression, sed will treat it as the literal characters {, 3, and }. So it will match lines like the following:

0{3}
1{3}
5{3}

but not lines like

346

because there are no braces.

Upvotes: 1

anubhava
anubhava

Reputation: 785481

Difference in this behavior is related to sed only.

In regular mode sed supports very basic regex only and hence { is matched literally unless escaped as you noticed.

sed '/[0-9]\{3\}/d'

In extended regex mode both [ and { don't need escaping:

sed -r '/[0-9]{3}/d'

OR on OSX:

sed -E '/[0-9]{3}/d'

[ and ] is considered a character class in both regular and extended regex modes (even shell's glob pattern supports it)

Upvotes: 1

brokenfoot
brokenfoot

Reputation: 11649

I think your question pertains to special characters in regular expressions. Check this out:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03

Upvotes: 0

Related Questions