yasmuru
yasmuru

Reputation: 1186

Fetching URL Based on InnerText C#

I would like to fetch a URL from an html or string based on the inner text value.

For Example:

<a href="http://www.itsmywebaddress.com">My Website</a>.
<a href="http://www.everythingisforgood.com">good</a>.

Here, I need to fetch the URL based on the inner text of "My Website" (which we provide as input).

Can anyone tell me , what is the Regex code for this or using HtmlAgilityPack how can we do this??

I have used the following Regex method . However, its fetching all the values inside the "a" tag.

Regex.Match(str, @"<a [^>]*>(.*?)</a>").Groups[1].Value;

Thanks in advance ...

Upvotes: 1

Views: 209

Answers (2)

Damith
Damith

Reputation: 63065

HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load(url);
var hrefs = doc.DocumentNode.SelectNodes("//a[@href]")
             .Where(link => link.InnerHtml == str)
             .Select(l=>l.Attributes["href"].Value).ToList();

Upvotes: 1

Kinexus
Kinexus

Reputation: 12904

If you are using htmlagility, you should be able to access the href directly without having to use a regex.

Something like this should work;

HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load(url);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
HtmlAttribute att = link.Attributes["href"];
}

Upvotes: 0

Related Questions