Weird sed greedy regex behavior

Question

I am trying to understand the behavior of sed wrt this regex:
sed -n "s/.*Directory $[^>]*$>/\1/p" /etc/apache2/sites-enabled/*

The goal here is to list the path to the webroot of all the enabled virtual hosts in Apache2.

The weird thing is that the result of this sample command:
sed -n "s/.*Directory $[^>]*$>/\1/p" <<< "" is as expected: /var/www/my_site

But the result of sed -n "s/.*Directory $[^>]*$/\1/p" <<< "" is : /var/www/my_site>

I know that the difference is the presence of >. The question is why is it necessary to add > to obtain the correct output ? [^>]* should be able to match everything and stop at > thus not capturing it in the parentheses.

I don't understand why the '>' character is caught in the first command and not in the second one. [^>] should have excluded '>' from the capturing parentheses...

Kent · Accepted Answer

in your first line, you replace the whole string by the things in your group1: so you got: /var/www/my_site

in your 2nd line, you replace Note, not whole string, the ending > was not there with the same capture group. but the ending > was kept untouched. So you see it in output.



another example:

$ sed -n "s/fo*$[^o]*$/\1/p" <<< "foooooowhatever this ooo will leave behind" 
whatever this ooo will leave behind


In above example, the target to replace is: foooooowhatever this the replacement is whatever this the rest string will be untouched.

Weird sed greedy regex behavior

Answers (2)

Related Questions