Reputation:
This regex
(<link\s+)((rel="[Ii]con"\s+)|(rel="[Ss]hortcut [Ii]con"\s+))(href="(.+)")(.+)/>
works for
<link rel="icon" href="http://passets-cdn.pinterest.com/images/favicon.png" type="image/x-icon" />
<link rel="shortcut icon" href="http://css.nyt.com/images/icons/nyt.ico" />
<link rel="shortcut icon" href="http://cdn.sstatic.net/careers/Img/favicon.ico?36da6b" />
<link rel="Shortcut Icon" href="/favicon.ico" type="image/x-icon" />
but not for where the href and rel attributes are switched:
<link href="/phoenix/favicon.ico" rel="shortcut icon" type="image/x-icon" />
How can I update it so the or statements are not ordered
so that
aa || bb
works just as well as
bb || aa
Test here:
I just want to pull the path from the favicon tag...I've chosen not to use a library.
Stema's answer in different form:
<link\s+
(
?=[^>]*rel="
(
?:[Ss]hortcut\s
)
?[Ii]con"\s+
)
(
?:[^>]*href="
(
.+?
)"
).*
/>
Upvotes: 1
Views: 90
Reputation: 93026
You could do it with a positive lookahead
<link\s+(?=[^>]*rel="(?:[Ss]hortcut\s)?[Ii]con"\s+)(?:[^>]*href="(.+?)").*/>
See it here on Regexr
You will find the path in the first capturing group.
The thing here is, that the lookahead is not matching anything. So you can check if somewhere within the tag there is rel="(?:[Ss]hortcut\s)?[Ii]con"
and if this pattern is found it will match the href
part and put the link into the capturing group 1.
(?=[^>]*rel="(?:[Ss]hortcut\s)?[Ii]con"\s+)
this is the positive lookahead assertion. Thats indicated by the ?=
at the start of the group.
[^>]
is a negated character class, that matches any character but the >
. I use this to ensure that it does not pass the closing >
of the tag.
Upvotes: 3
Reputation: 30414
You can use one regex to locate the icon tag and a second regex to pull the path.
If the only text that your second regex parses is a single tag it can be as simple as /href="(.+)"/
and the order of attributes within the tag will not matter.
Upvotes: 2
Reputation: 43178
You cannot, not with a single regular expressions. Well, you actually can, but it is really not worth it, and you will end up with an unreadable mess of a regex.
Match against /<link\s([^>]+rel="(shortcut\s+)?icon"[^>]*)>/i
and then match the captured part against /\shref="([^"]+)"/i
.
Upvotes: 4