Harikrishna
Harikrishna

Reputation: 4305

Inbuilt Regex class or Parser.How to extract text between the tags from html file?

I have html file in which there is table content and other information in my c#.net application.

I want to parse the table contents for only some columns.Then should I use parser of html or Replace method of Regex in .net ?

And if I use the parser then how to use parser? Will parser extract the inforamation which is between the tags? If yes then how to use ? If possible show the example because I am new to parser.

If I use Replace method of Regex class then in that method how to pass the file name for which I want to extract the information ?

Edit : I want to extract information from the table in html file. For that how can I use html agility parser ? What type of code I should write to use that parser ?

Upvotes: 0

Views: 384

Answers (2)

Mark Byers
Mark Byers

Reputation: 839114

You just asked an almost identical question and deleted it. Here was the answer I gave before:


Try the HTML Agility Pack.

Here's an example:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

Regarding your extra question regarding regex: do not use Regex to parse HTML. It is not a robust solution. The above library can do a much better job.

Upvotes: 4

Arnis Lapsa
Arnis Lapsa

Reputation: 47647

HtmlAgilityPack....

Next time - search for an answer before. This is duplicate for sure.

Little tutorial.

Upvotes: 1

Related Questions