Reputation: 3069
the regex expression is as below:
if ($ftxt =~ m|/([^=]+)="(.+)"|o)
{
.....
}
this regex seems different from many other regex.What makes me confused is the "|" ,most regex use "/" instead of "|". And , group ([^=]+) also makes me confused.I know [^=] means "the start of the string" or "=",but what does it mean by repeat '^' one or more times? ,how to explain this?
Upvotes: 3
Views: 329
Reputation: 5153
It is meant to match equation like expressions, to capture the key and values separately. Imagine you have a statement like height="30px"
, and you want to capture the height
attribute name, as well as its value 30px
.
So you have m|/([^=]+)="(.+)"|
.
The key is supposed to be everything before the =
is encountered. So [^=]
captures it. The ^
is a negation metacharacter when used as the first character inside []
brackets. It means that it will match any character except =
, which is what you want. The /
is probably a mistake, if you need to capture the group, you should not use it, or if it is indeed intended, it means to literally match an opening parentheses. Since it is a special character, it needs to be escaped, that's why \(
. if you mean to capture the group, it should be ([^=]+)
.
Next comes the =
sign, which you don't care about. Then the quotes which contain the value. So you capture it like "(.+)"
. the .+
will go on matching greedily every character, including the final "
. But then it will find that it can't match the final "
in the regex, so it will backtrack, give up the last "
the regex (.+)
captured, so that leaves the string within the quotes to be captured in the group. Now you are ready to access the key and value through $1
and $2
. Cool, isn't it?
Upvotes: 2
Reputation: 39950
Some regexp implementations allow you to use other special characters besides /
as the delimiter. This is useful if you need to use that special character inside the regular expression itself, since you don't have to escape it. (In and of itself /
is not a special character in regexp syntax, but it needs escaping if it's used in the regexp literal syntax of the host language.) The docs on Perl's quote operators mention this.
This is tutorial-level stuff: square brackets ([abc]
) denote a character class - it means "any of the characters inside the brackets". (In my example, it means "either a
or b
or c
.) Inside them, the ^
special character has a different meaning, it inverts the character class. So, [^=]
means "any character except =
", and [^=]+
means "one or more characters that aren't =
".
Quoting the docs on Perl's RE syntax:
You can specify a character class, by enclosing a list of characters in
[]
, which will match any character from the list. If the first character after the "[" is "^", the class matches any character not in the list.
Upvotes: 4
Reputation: 71538
You can use different delimiters instead of /
. For instance you could use:
m#/([^=]+)="(.+)"#o
Or
m~/([^=]+)="(.+)"~o
The advantage here of using something different than /
is that you don't have to escape slashes, because otherwise, you'd have to use:
m/\/([^=]+)="(.+)"/o
^
[Or [/]
]
([^=]+)
is a capture group, and inside, you have [^=]+
. [^=]
is a negated class and will match any character which is not a =
.
^
behaves differently at the beginning of a character class and is not the same as ^
outside a character class which means 'beginning of line'.
As for the last part o
, this is a flag which I haven't met so far so a little search brought me to this post, I quote:
The
/o
modifier is in the perlop documentation instead of the perlre documentation since it is a quote-like modifier rather than a regex modifier. That has always seemed odd to me, but that's how it is.Before Perl 5.6, Perl would recompile the regex even if the variable had not changed. You don't need to do that anymore. You could use
/o
to compile the regex once despite further changes to the variable, but as the other answers noted,qr//
is better for that.
Upvotes: 6