Mohammad Arshad Alam
Mohammad Arshad Alam

Reputation: 9862

Extract heading text from HTML text

I have a textarea with tinyMCE text editor to make it RichTextEditor. I want to extract all heading(H1,H2 etc) text without style and formatting .
Suppose that txtEditor.InnerText gives me value like below:

<p><span style="font-family: comic sans ms,sans-serif; color: #993366; font-size: large; background-color: #33cccc;">This is before heading one</span></p>
<h1><span style="font-family: comic sans ms,sans-serif; color: #993366;">Hello This is Headone</span></h1>
<p>this is before heading2</p>
<h2>This is heading2</h2>

i want to get a list of heading tag's text only ? any kind of suggestion and guidance will be appreciated.

Upvotes: 1

Views: 2670

Answers (2)

Antonio Bakula
Antonio Bakula

Reputation: 20693

Use HtmlAgilityPack, and then it's easy :

  var doc = new HtmlDocument();
  doc.LoadHtml(txtEditor.InnerText);
  var h1Elements = doc.DocumentNode.Descendants("h1").Select(nd => nd.InnerText);
  string h1Text = string.Join(" ", h1Elements);

Upvotes: 3

Chris Ayers
Chris Ayers

Reputation: 49

referencing Regular Expression to Read Tags in HTML
I believe that this is close to what you are looking for:

String h1Regex = "<h[1-5][^>]*?>(?<TagText>.*?)</h[1-5]>";

MatchCollection mc = Regex.Matches(html, h1Regex);

Upvotes: 0

Related Questions