sed regular expression not matching

Question

I have a XML file that looks like this:


    Awesome Group
    
    2013-04-04
    False
    7

I'm trying to print everything between and with this command:

$ sed -n '/\/ p' file.xml

Notice I'm escaping the open and close brackets as well as the forward slash before the close bracket. This returns no matches, which I find odd.

What's even more odd is that this command works:

$ sed -n '// p' file.xml

Why does this command work, since I'm not escaping the open and close brackets?

EDIT

ruakh helpfully pointed out that there are different implementations of sed, and that open and close brackets don't need to be escaped (I thought sed used Perl syntax for regular expressions). I found another post on Unix & Linux that was also helpful: https://unix.stackexchange.com/questions/32907/what-characters-do-i-need-to-escape-when-using-sed-in-a-sh-script

Now I'm having a problem matching a multi-line regular expression. How come this doesn't work?

$ sed -n -r '/^[\S\s]*?<\/Icon>$/ p' file.xml

I've tried with and without the -r (extended mode), with and without the ^ and $, using .* instead of [\S\s]*, all with no matches

ruakh · Accepted Answer

In sed, < and > have no special meaning, but \< and \> sometimes do: in some implementations, they mean "start of word" and "end of word". For example, this Bash command:

{ echo a ; echo ba ; echo b a ; } | sed -n '/\



will, on some systems, print a and b a (where there's an a at the very start of a word), but not ba (where there isn't).

(Judging from the tags you've chosen, you may be used to Perl? Perl makes a future-proof guarantee that \, when it precedes a non-word character, will always escapes it. For example, < has no special meaning, but \< is guaranteed to mean < anyway. But not all regex engines take that approach.)



Edit for edited question:

Sed processes one line at a time — that's part of what makes it a "stream editor" — so a multiline regex is essentially doomed to failure. However, in your case, you don't actually need a multiline regex; you just want to find the line that contains  and the (distinct) line that contains , and print all lines between the two (inclusive). For that, you can use an address range, specifying a start-address of // and an end-address of /<\/Icon>/:

sed -n '//,/<\/Icon>/ p'


(See §3.2 "Selecting lines with sed" in the GNU sed user's manual..)

sed regular expression not matching

Answers (2)

Related Questions