Saravanan
Saravanan

Reputation: 11592

How to remove the <br> tag in my HTML string using HtmlAgilityPack in C#?

I have an HTML string and I am using HtmlAgilityPack for parsing HTML string.

This is my html string:

<p class="Normal-P" style="direction: ltr; unicode-bidi: normal;"><span class="Normal-H">sample<br/></span> <span class="Normal-H">texting<br></span></p>

This HTML string has <br> tag in two places. How can I remove both of them?

Upvotes: 3

Views: 3554

Answers (2)

VladL
VladL

Reputation: 13033

string html = ...;
string html = Regex.Replace(html, "<br>", "", RegexOptions.Singleline);

Upvotes: 1

Cristian Lupascu
Cristian Lupascu

Reputation: 40516

It's as easy as:

  • loading the HTML fragment into an Agility Pack HtmlDocument
  • getting all <br /> tags using the "//br" xpath expression
  • removing the tags obtained at the previous step using the Remove() method
  • inspecting the result in the DocumentNode.OuterHtml property

Here it is in code:

const string htmlFragment =
    @"<p class=""Normal-P"" style=""direction: ltr; unicode-bidi: normal;"">" +
    @"<span class=""Normal-H"">sample<br/></span>" +
    @"<span class=""Normal-H"">texting<br></span></p> ";

var document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(htmlFragment);

foreach (var brTag in document.DocumentNode.SelectNodes("//br"))
    brTag.Remove();

Console.WriteLine(document.DocumentNode.OuterHtml);

Upvotes: 5

Related Questions