Reputation: 291
I have a large text in that some word is hyperlinked, I want to know all that text and it's hyperlink url suppose my text is as per below:
LoremIpsum.Net is a small and simple static site that provides you with a decent sized passage without having to use a generator. The site also provides an all caps version of the text, as well as translations, and an explanation of what this famous.
Now I want to store that hyperlinked word and it's url in array or hash table, can any one suggest me or provide me some sample code to do this.
Thanks in advance.
Upvotes: 1
Views: 193
Reputation: 30234
Try the HTMLAgilityPack http://www.codeplex.com/htmlagilitypack
Something like
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
HtmlAttribute att = link["href"];
// these are your hrefs!
}
You will lose your mind if you don't use a proper HTML parser.
Upvotes: 0
Reputation: 33139
See "Program that scrapes with Regex [C#]" on this page: http://www.dotnetperls.com/scraping-html
It basically works by regexing your text and collecting the matches.
Upvotes: 1