Reputation: 2084
How can I set which order to match things in a PCRE regular expression?
I have a dynamic regular expression that a user can supply that is used to extract two values from a string and stores them in two strings. However, there are cases where the two values can be in the string in reverse order, so the first (\w+) or whatever needs to be stored in the second string.
Upvotes: 0
Views: 516
Reputation: 75242
If you're matching both parts with the same subpattern (like \w+
), you're out of luck. But if the subpatterns are distinctively different you have a few options, none of them very pretty. Here's a regex that uses a conditional construct to match the src
and type
attributes of an HTML script element in either order:
\b(?(?=src=)
src="([^"]*)"\s+type="([^"]*)"|
type="([^"]*)"\s+src="([^"]*)"
)
(DISCLAIMER: This regex makes many unrealistic assumptions, chief among them that both attributes will be present and that they'll be adjacent to each other. I'm only using it to illustrate the technique.)
If the src
attribute appears first, the src
and type
values will be captured in the first and second groups respectively. Otherwise, they'll appear in the fourth and third groups respectively. Named groups would make it easier to keep track of things, especially if could use the same name in more than place like you can in .NET regexes. Unfortunately, PCRE requires every named group to have a unique name, which is too bad; that's a very nice feature.
Upvotes: 1
Reputation: 81
you can extract the strings by name using
(?<name>\w+)
and get the values with
pcre_get_named_substring
Upvotes: 3