Reputation: 17498
Im trying to craft a regex that only returns <link>
tag hrefs
Why does this regex return all hrefs including <a hrefs?
(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+
<link rel="stylesheet" rev="stylesheet" href="idlecore-tidied.css?T_2_5_0_228" media="screen">
<a href="anotherurl">Slash Boxes</a>
Upvotes: 3
Views: 15316
Reputation: 336158
What regex flavor are you using? Perl, for one, doesn't support variable-length lookbehind. Where that's an option, I'd choose (edited to implement the very good idea from MizardX):
(?<=<link\b[^<>]*?)href\s*=\s*(['"])(?:(?!\1).)+\1
as a first approximation. That way the choice of quote character (' or ") will be matched. The same for a language without support for (variable-length) lookbehind:
(?:<link\b[^<>]*?)(href\s*=\s*(['"])(?:(?!\2).)+\2)
\1 will contain your match.
Upvotes: 0
Reputation: 41142
Avoid lookbehind for such simple case, just match what you need, and capture what you want to get.
I got good results with <link\s+[^>]*(href\s*=\s*(['"]).*?\2)
in The Regex Coach with s and g options.
Upvotes: 1
Reputation: 89171
Either
/(?<=<link\b[^<>]*?)\bhref=\s*=\s*(?:"[^"]*"|'[^']'|\S+)/
or
/<link\b[^<>]*?\b(href=\s*=\s*(?:"[^"]*"|'[^']'|\S+))/
The main difference is [^<>]*?
instead of .*?
. This is because you don't want it to continue the search into other tags.
Upvotes: 3
Reputation: 83622
(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+
works with Expresso (I think Expresso runs on the .NET regex-engine). You could even refine this a bit more to match the closing '
or
"
:
(?<=<link\s+.*?)href\s*=\s*([\'\"])[^\'\"]+(\1)
Perhaps your regex-engine doesn't work with lookbehind assertions. A workaround would be
(?:<link\s+.*?)(href\s*=\s*([\'\"])[^\'\"]+(\2))
Your match will then be in the captured group 1.
Upvotes: 0
Reputation: 546055
/(?<=<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/
i'm a little shaky on the back-references myself, so I left that in there. This regex though:
/(<link\s+.*?)href\s*=\s*[\'\"][^\'\"]+[^>]*>/
...works in my Javascript test.
Upvotes: 0