YodasMyDad
YodasMyDad

Reputation: 9475

Strip Specific Styles from the Style attribute in Html string using Html Agility Pack

I have a string of Html and it contains varied Html but includes this

<span style="display:block;position:fixed;width:100%;height:2000px;background-color:rgba(0,0,0,0);z-index:9999!important;top:0;left:0;cursor:default;"></span>

This will seem strange, but I only want to remove specific items within the style attribute (For all Html elements). For example I want to remove

position:fixed and z-index:9999!important; and top:0; and left:0;

To name a few, but keep everything else. Now the issue is, it's not necessarily position:fixed; it could be position:absolute; or whatever. Just as it could be z-index:9998; or top:20; etc...

I need to be able to remove style elements by their key, so position:*anything* and top:*anything* etc.... AND also do this in a non-case sensitive manner. So it would get POSITION:*anything* or PoSition:*anything*

Is there a way to achieve this using the Html Agility Pack?

Upvotes: 6

Views: 8095

Answers (5)

My solution is remove your old element in style attribute and you can add another element you want or not. I have example:

<span style="font-size:12pt;font-family:Calibri;mso-fareast-font-family:宋体;mso-bidi-font-family:Arial;lang:EN-US;mso-fareast-language:EN-US;mso-ansi-language:AR-SA;">&nbsp;</span>

I want to remove font-family:Calibri;. You can see each element in style attribute is separated by ";". So you just find the index of your attribute you want and the index of ";" nearest after that. Remove it and add something or not.

My example code remove element font-family:Calibri and add font-family:Time New Roman is:

foreach (HtmlAgilityPack.HtmlNode span in doc.DocumentNode.SelectNodes("//span[@style]"))
{
    HtmlAgilityPack.HtmlAttribute att = span.Attributes["style"];
    att.Value = ChageFontFamily(att.Value);        
}

private string ChageFontFamily(string value)
{
    //Get index of element you want to remove.
    //Get index of ";" neareast to remove.
    var idxStart = value.IndexOf("font-family:");
    var idxEnd = value.IndexOf(";", idxStart);

    //remove it
    value = value.Remove(idxStart, idxEnd - idxStart + 1);

    //you can add other element like this and return it
    value += "font-family:Time New Roman;";
    return value;
}

That is my solution. Enjoy it!!!

Upvotes: 1

syonip
syonip

Reputation: 2971

There is a very simple way of editing style attribute in HAP, as seen in the example here: https://html-agility-pack.net/knowledge-base/12062495/better-way-to-add-a-style-attribute-to-html-using-htmlagilitypack.

const string margin = "margin-top: 0";
foreach (var pTagNode in pTagNodes)
{
    var styles = pTagNode.GetAttributeValue("style", null);
    var separator = (styles == null ? null : "; ");
    pTagNode.SetAttributeValue("style", styles + separator + margin);
}

Upvotes: 1

Pete Duncanson
Pete Duncanson

Reputation: 3246

I think you'll just have to use HAP to grab the elements you want to clean up, grab the styles from the attribute and then loop over them to manually clean them.

I'd split on the ";" then the ":" to get name/value pairs. Loop over them, lowercase the name and throw it into a switch statement with fall throughs on them for ease and have a default that appends the name/value to a new string. Then inject the new string of styles back into your attribute.

 // Psuedo code, not the real deal!!
 // Inspired from http://htmlagilitypack.codeplex.com/wikipage?title=Examples
 HtmlDocument doc = new HtmlDocument(); 

 doc.Load("file.htm");
 foreach(HtmlNode span in doc.DocumentElement.SelectNodes("//span[@style]"))
 {
    HtmlAttribute att = span["style"];
    att.Value = CleanStyles(att.Value);
 }
 doc.Save("file.htm");

 // Elsewhere
 public string CleanStyles( string oldStyles ) {
    string newStyles = "";
    foreach( var entries in oldStyle.Split( ';' ) ) {
       var values = entries.Split(':');
       switch( values[0].ToLower() ) {
          case "position":
          case "z-index":
            // Do nothing, skip this value
            break;
          default:
             newStyles += values.Join(':') + ";";
       }
    }  
    return newStyles;
 }

Something like that anyway.

Upvotes: 1

Memetican
Memetican

Reputation: 407

There doesn't appear to be any support for inline style string parsing in the HTML Agility Pack, but .NET does have some capabilities for this in System.Web.UI to support WebForms controls.

It's called the CssStyleCollection, and it will convert your style string into a nice array of string key/value pairs, and allow you to remove the specific keys you do not want.

However, since it's an internal tool for WebControl use, it doesn't have a public constructor. Instead, you have to instantiate it via reflection, or use a hack like this;

CssStyleCollection style = new Panel().Style;

Once created,

style.Value = "YOUR STYLE STRING"; 

And then remove the items you don't want;

style.Remove("position");
style.Remove("z-index");
style.Remove("top");
style.Remove("left");

Retrieve your new delimited style string from style.Value.

IMPORTANT: I haven't tested this, but the process seems simple enough, if a bit hacky. There may be some surprises I haven't come across yet. In particular, I have no idea how it handles situations where there are multiple duplicate style settings in the same string;

top:0;margin-left:20;top:10; 

In inline style strings, browsers will respect the last specified value, so top:10 wins. However since CssStyleCollection uses unique keys, it cannot store both top values and most likely discards one.

Upvotes: 5

Charlie Afford
Charlie Afford

Reputation: 86

Use the HTML pack and a regular expression. Match on names before : select text after : up to next ; that should give you an array of positions? Then remove those positions from the string? Hope that makes sense :)

You could do it the way above but I gets messy. Use regex if possible :)

Upvotes: 0

Related Questions