Reputation: 1186
I would like to fetch a URL from an html
or string based on the inner text value.
For Example:
<a href="http://www.itsmywebaddress.com">My Website</a>.
<a href="http://www.everythingisforgood.com">good</a>.
Here, I need to fetch the URL based on the inner text of "My Website" (which we provide as input).
Can anyone tell me , what is the Regex
code for this or using HtmlAgilityPack
how can we do this??
I have used the following Regex
method . However, its fetching all the values inside the "a" tag.
Regex.Match(str, @"<a [^>]*>(.*?)</a>").Groups[1].Value;
Thanks in advance ...
Upvotes: 1
Views: 209
Reputation: 63065
HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load(url);
var hrefs = doc.DocumentNode.SelectNodes("//a[@href]")
.Where(link => link.InnerHtml == str)
.Select(l=>l.Attributes["href"].Value).ToList();
Upvotes: 1
Reputation: 12904
If you are using htmlagility, you should be able to access the href directly without having to use a regex.
Something like this should work;
HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load(url);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
HtmlAttribute att = link.Attributes["href"];
}
Upvotes: 0