Reputation: 1404
I am trying to understand the behavior of sed wrt this regex:
sed -n "s/.*Directory \([^>]*\)>/\1/p" /etc/apache2/sites-enabled/*
The goal here is to list the path to the webroot of all the enabled virtual hosts in Apache2.
The weird thing is that the result of this sample command:
sed -n "s/.*Directory \([^>]*\)>/\1/p" <<< "<Directory /var/www/my_site>"
is as expected: /var/www/my_site
But the result of sed -n "s/.*Directory \([^>]*\)/\1/p" <<< "<Directory /var/www/my_site>"
is : /var/www/my_site>
I know that the difference is the presence of >
. The question is why is it necessary to add >
to obtain the correct output ? [^>]*
should be able to match everything and stop at >
thus not capturing it in the parentheses.
I don't understand why the '>' character is caught in the first command and not in the second one.
[^>]
should have excluded '>' from the capturing parentheses...
Upvotes: 1
Views: 110
Reputation: 784958
1st command has >
in search term but 2nd one doesn't have it.
[^>]*
matches everything before a >
is matched (not including >
) hence >
remains in your 2nd sed command which is:
sed -n "s/.*Directory \([^>]*\)/\1/p"
Also note in first command:
sed -n "s/.*Directory \([^>]*\)>/\1/p"
\1
is not capturing >
but your sed command is omitting it in replacement.
Upvotes: 1
Reputation: 195029
in your first line, you replace the whole string by the things in your group1: so you got: /var/www/my_site
in your 2nd line, you replace <Directory......site
Note, not whole string, the ending >
was not there with the same capture group. but the ending >
was kept untouched. So you see it in output.
another example:
$ sed -n "s/fo*\([^o]*\)/\1/p" <<< "foooooowhatever this ooo will leave behind"
whatever this ooo will leave behind
In above example, the target to replace is: foooooowhatever this
the replacement is whatever this
the rest string will be untouched.
Upvotes: 2