Askiitians
Askiitians

Reputation: 291

Find hyperlinked text and URL

I have a large text in that some word is hyperlinked, I want to know all that text and it's hyperlink url suppose my text is as per below:

LoremIpsum.Net is a small and simple static site that provides you with a decent sized passage without having to use a generator. The site also provides an all caps version of the text, as well as translations, and an explanation of what this famous.

Now I want to store that hyperlinked word and it's url in array or hash table, can any one suggest me or provide me some sample code to do this.

Thanks in advance.

Upvotes: 1

Views: 193

Answers (2)

Ian G
Ian G

Reputation: 30234

Try the HTMLAgilityPack http://www.codeplex.com/htmlagilitypack

Something like

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
     HtmlAttribute att = link["href"];
   // these are your hrefs!
 }

You will lose your mind if you don't use a proper HTML parser.

Upvotes: 0

Roy Dictus
Roy Dictus

Reputation: 33139

See "Program that scrapes with Regex [C#]" on this page: http://www.dotnetperls.com/scraping-html

It basically works by regexing your text and collecting the matches.

Upvotes: 1

Related Questions