mon
mon

Reputation: 22254

sed why POSIX blacket expression needs to be in another blacket?

Qustion

What is the reason that POSIX expression such as [:space:] needs to be in another [ ] ?

$ echo "a b c" | sed 's/[:space:]*/_/g'
_ _b_ _

$ echo "a b c" | sed 's/[[:space:]]*/_/g'
_a_b_c_

$ echo "a b c" | sed 's/[[:space:]][[:space:]]*/_/g'
a_b_c

Update

Regular Expressions/POSIX Basic Regular Expressions

Character classes
The POSIX standard defines some classes or categories of characters as shown below. These classes are used within brackets.

I had not understood what the character classes was but assumed it was a special character matching any white spaces, hence believed 's/[:space:]/_g/' would match space in-between "a b", however I suppose '[:space:]' itself would not match any character (please correct if this is still wrong).

I suppose [:space:] is like '\t\n\r\f\v' but by itself has no function. With blacket '[[:space:]]', it then has the function same as '[\t\n\r\f\v]'.

Upvotes: 0

Views: 81

Answers (1)

Ed Morton
Ed Morton

Reputation: 203645

You need to understand the terminology:

A bracket expression is a set of characters enclosed in [ and ] and can be used as such in a regexp. That set of characters can be represented by any combination of any of the following (and an optional initial ^ negation character):

  1. A character list, e.g. abcd...z, or
  2. A character range, e.g. a-z, or
  3. A character class, e.g. [:lower:]

So [:space:] is a character class (representing all white space chars) and that can be used within a bracket expression [...] in a regexp just like if you specifically listed all white space chars within the bracket expression [...]. So this:

[:space:]

is just a character class, while this:

[[:space:]]

is a bracket expression which includes all white space chars and this:

[[:space:][:lower:]_#;A-D]

is a bracket expression which includes tall white space chars plus all lower case letters plus the chars _, #, and ; plus the letters in the range A through D (whatever those chars are in your locale).

Upvotes: 1

Related Questions