mak
mak

Reputation: 1404

Weird sed greedy regex behavior

I am trying to understand the behavior of sed wrt this regex:
sed -n "s/.*Directory \([^>]*\)>/\1/p" /etc/apache2/sites-enabled/*

The goal here is to list the path to the webroot of all the enabled virtual hosts in Apache2.

The weird thing is that the result of this sample command:
sed -n "s/.*Directory \([^>]*\)>/\1/p" <<< "<Directory /var/www/my_site>" is as expected: /var/www/my_site

But the result of sed -n "s/.*Directory \([^>]*\)/\1/p" <<< "<Directory /var/www/my_site>" is : /var/www/my_site>

I know that the difference is the presence of >. The question is why is it necessary to add > to obtain the correct output ? [^>]* should be able to match everything and stop at > thus not capturing it in the parentheses.

I don't understand why the '>' character is caught in the first command and not in the second one. [^>] should have excluded '>' from the capturing parentheses...

Upvotes: 1

Views: 110

Answers (2)

anubhava
anubhava

Reputation: 784958

1st command has > in search term but 2nd one doesn't have it.

[^>]* matches everything before a > is matched (not including >) hence > remains in your 2nd sed command which is:

sed -n "s/.*Directory \([^>]*\)/\1/p"

Also note in first command:

sed -n "s/.*Directory \([^>]*\)>/\1/p"

\1 is not capturing > but your sed command is omitting it in replacement.

Upvotes: 1

Kent
Kent

Reputation: 195029

in your first line, you replace the whole string by the things in your group1: so you got: /var/www/my_site

in your 2nd line, you replace <Directory......site Note, not whole string, the ending > was not there with the same capture group. but the ending > was kept untouched. So you see it in output.

another example:

$ sed -n "s/fo*\([^o]*\)/\1/p" <<< "foooooowhatever this ooo will leave behind" 
whatever this ooo will leave behind

In above example, the target to replace is: foooooowhatever this the replacement is whatever this the rest string will be untouched.

Upvotes: 2

Related Questions