Dillinger
Dillinger

Reputation: 341

Get <element> inner content by class with Regex

I am trying to get the myTEXT of every <td> element with myClass class.

Example: <td class="myClass" colspan="3">myTEXT</td>

I tried with something like Using regex to get text between multiple HTML tags but i had to filter by myClass.

I am new to lookahead, i was able to match using (?=(<td.*)class="myClass".*?>){1}(.*?)<\/td>, but it includes the <td(...)>and </td>.

So my question is, how can i get only the text between every <td> using myClass class?

Upvotes: 1

Views: 114

Answers (1)

snowbear
snowbear

Reputation: 26

maybe... you can use this http://html-agility-pack.net/

this one is support xpath grammar so you can use like this:

HtmlAgilityPack.HtmlDocumnet doc = new HtmlAgilityPath.HtmlDocumnet();
doc.LoadHtml(your html string);

HtmlAgilityPack.HtmlNodeCollection col = doc.DocumentNode.SelectNodes("//img");
foreach(var node in col) {
    Console.WriteLine(node.OuterHtml);
}

I hope that this can help you.

Upvotes: 1

Related Questions