Reputation: 6154
I've been doing a lot of reading on .NET regular expressions and I have developed a regular expression, that I can't make any sense of.
(src|href)="\w+|(\w+/)+
The way I read this regular expression:
This is meant to match something like 'src="Folder', 'src="folder/', 'href="Folder/SubFolder/', etc.
Input:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
Using this regular expression, with this input, there is one match.
org/1999/
Can anyone possibly explain this? Src or href aren't referenced in the entire string, how can there be any match at all?
Upvotes: 3
Views: 163
Reputation: 30922
What's happening here is the | is seperating the regex into two completely seperate conditions. That is select either: (src|href)="\w+
OR (\w+/)+
of which second bit is being matched:
org/1999/
In your case you'd probably need to put the last part in parentheses to make it clear what exactly the alternation |
refers to:
(src|href)="(\w+|(\w+/)+)
Btw I used Expresso to help work this out.
Upvotes: 6