Neal.Marlin
Neal.Marlin

Reputation: 508

why don't escape characters and regex work well with sed command?

In my case, I wanna separate one line to words with sed command as below, which I though all should work.

[heping@Laputa:~]$echo "abc  def    gks       dps" | sed "s/\s+/\n/g"
abc  def    gks       dps
[heping@Laputa:~]$echo "abc  def    gks       dps" | sed "s/\s\{1,\}/\n/g"
abc  def    gkn       dpn
[heping@Laputa:~]$echo "abc  def    gks       dps" | sed "s/ \{1,\}/\n/g"
abcndefngksndps
[heping@Laputa:~]$echo "abc  def    gks       dps" | sed "s/ \{1,\}/:/g"
abc:def:gks:dps
[heping@Laputa:~]$echo "abc  def    gks       dps" | sed "s/ +/:/g"
abc  def    gks       dps

But actually, only one works.

[heping@Laputa:~]$echo "abc  def    gks       dps" | sed "s/ \{1,\}/:/g"
    abc:def:gks:dps

It seems that the \s character set and + special character in regex do not work well with sed command. And the \n is not recognized as a new line. Could anyone tell me why or give some clue. Thank you.

Upvotes: 0

Views: 1356

Answers (2)

Ed Morton
Ed Morton

Reputation: 203219

sed matches on Basic Regular Expressions while the meta-character + is from Extended Regular Expressions. The shorthand \s for the POSIX character class [[:space:]] will only work in some seds (e.g. GNU sed) as an extension. Similarly \n will only work as meaning "newline" in some seds while in any sed you can use a backslash followed by a literal newline character. Your use of double (") instead of single quotes (') around your script is exposing it to the shell and so requiring extra backslash escapes - always use single quotes around strings or scripts unless you have a very specific need for double quotes (e.g. to let a variable expand) and only use double unless you have a very specific need for none (e.g. to allow globbing wildcard expansion).

To do what you want in any POSIX sed is:

$ echo 'abc  def    gks       dps' | sed 's/[[:space:]][[:space:]]*/\
/g'
abc
def
gks
dps

but this will work with GNU sed (note the -E to enable EREs for + - that is supported in GNU sed and OSX/BSD sed but of those 2 seds only GNU sed will support \s and \n):

$ echo 'abc  def    gks       dps' | sed -E 's/\s+/\n/g'
abc
def
gks
dps

Upvotes: 3

Amadan
Amadan

Reputation: 198314

There are several problems. First of all, sed uses basic regular expressions by default, which do not recognise +. Use -E modifier for extended regular expressions, which do.

Second, sed doesn't recognise \n; but you can use ANSI C quoting to make bash understand it. However, if you just use \n, you'll just have a line break in your sed pattern, so you have to escape the line break to make sed use it literally; so you need \\ for an escape, and \n for a line break, for a total of three backslashes.

Finally, \s as a character class is also not recognised by vanilla sed (but it is available on GNU sed that is used by Linux distributions). Use a literal space instead if you need compatibility with e.g. OSX (or brew install gnu-sed).

echo "abc  def    gks       dps" | sed -E $'s/ +/\\\n/g'
# => abc
#    def
#    gks
#    dps

Upvotes: 2

Related Questions