Reputation: 31
Need to remove <span style="color:#000000;"/>
tag when <span>
tag is empty
<html>
<body>
<p left-margin="0" style="margin:0 0 0 0;text-align:Left;font-style:italic;"><span style="color:#000000;"/></p>
<p>Newly <span style="font-weight:bold;">Created</span> this document...</p>
<p />
<p>Regards,</p>
<p>Dhanush.</p>
</body>
</html>
Already we are using below regEx for removing unwanted XML
if (!string.IsNullOrEmpty(text))
{
var xmlPattern = "[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-\x10FFFF]";
return Regex.Replace(text, xmlPattern, string.Empty);
}
I need to final result like below
<html>
<body>
<p left-margin="0" style="margin:0 0 0 0;text-align:Left;font-style:italic;">
<p>Newly <span style="font-weight:bold;">Created</span> this document...</p>
<p />
<p>Regards,</p>
<p>Dhanush.</p>
</body>
</html>
Upvotes: 0
Views: 445
Reputation: 37500
Don't use Regex for any XML parsing!
Using XDocument
will suffice here:
var html = XDocument.Parse(htmlString);
var spanElements = html.Descendants("span").ToList();
for (int i = spanElements.Count - 1; i >= 0; i--)
if (spanElements[i].Value == "") spanElements[i].Remove();
Upvotes: 1
Reputation: 9771
One approach would be to use HtmlAgilityPack
instead of Regex
Use this Install-Package HtmlAgilityPack -Version 1.11.4
command in the package manager console to install nuget package for HtmlAgilityPack
Code:
HtmlDocument doc = new HtmlDocument();
doc.Load(@"Path to html file");
if (doc.DocumentNode.SelectNodes("//span") != null)
{
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//span"))
{
var attributes = node.Attributes;
foreach (var item in attributes)
{
if (item.Name.Equals("style") && item.Value.Contains("color:#000000;"))
{
node.ParentNode.RemoveChild(node);
}
}
}
}
string html = doc.DocumentNode.OuterHtml;
Output: (From Debugger)
Upvotes: 1