Reputation: 4796
I'm using the following Regex (which I found online) to obtain the urls within a HTML page;
Regex regex = new Regex(@"url\((?<char>['""])?(?<url>.*?)\k<char>?\)");
Works fine for the HTML below;
<div style="background:url(images/logo.png) no-repeat;">UK</div>
However returns more than I need when the HTML page contained the following Javascript, returning 'destpage'
function buildurl(destpage)
I tried the following regex to include a colon, but it appears to be invalid
:url\((?<char>['""])?(?<:url>.*?)\k<char>?\)
Any help would be much appreciated.
Upvotes: 0
Views: 412
Reputation: 321
Only add the colon to the front:
:url\((?<char>['""])?(?<url>.*?)\k<char>?\)
The second "url
" is the name of that group.
Upvotes: 0
Reputation: 69362
To get all the URLs, use the HtmlAgilityPack instead of a Regex. From their example page
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
}
You can expand on that to obtain your style urls by, for example, using //@style
to get the style
nodes and iterating through those to extract the url
value.
Upvotes: 3