Coding Duchess
Coding Duchess

Reputation: 6929

Setting InnerHtml property using HtmlAgilityPack produces unexpected results

I am using HtmlAgilityPack and C# in order to convert older IE tags as well as Javascript to be compatible with other browsers. Here is an example:

Old code:

<script for="thisForm" event="onsubmit()" language="JScript">

var Checked = false
var Counter = 0

for (;Counter < this.choice.length; Counter++)
{
    if (this.choice[Counter].checked)
    {
        Checked = true
        this.action = this.choice[Counter].value
    }
}

if (!Checked)
{
    alert ("Please make a selection")
    return false
}
</script>

I convert to:

<script ftype="text\JScript">
function thisForm_onsubmit(el)
{
var Checked = false
var Counter = 0

for (;Counter < el.choice.length; counter++)
{
    if (el.choice[counter].checked)
    {
        checked = true
        el.action = el.choice[counter].value
    }
}

if (!checked)
{
    alert ("please make a selection")
    return false
}
}
</script>

What I did above is removed for, event, and language attributes from script tag, added type="text/JScript" attribute and wrapped the javascript into a function code.

I do it by simply additing HtmlNode attributes and then replacing InnerHtml property value. So far it worked fine for me untill I encountered the above function. somehow instead of giving me the result above, I get the following:

<script type="text/JScript">
function thisForm_onsubmit(el)
{
var Checked = false
var Counter = 0

for (;Counter < el.choice.length; counter++)
{
    if (el.choice[counter].checked)
    {
        checked = true
        el.action = el.choice[counter].value
    }
}

if (!checked)
{
    alert ("please make a selection")
    return false
}

}
  el.choice.length;="" counter++)="" {="" if="" (el.choice[counter].checked)="" {="" checked="true" el.action="el.choice[Counter].value" }="" }="" if="" (!checked)="" {="" alert="" ("please="" make="" a="" selection")="" return="" false="" }="" }=""></ el.choice.length; counter++)
{
    if (el.choice[counter].checked)
    {
        checked = true
        el.action = el.choice[counter].value
    }
}

if (!checked)
{
    alert ("please make a selection")
    return false
}

}
></script>

The strange part that the text I am assigning to the InnerHtml is correct, but scriptNode.InnerHtml shows different value

Here is my C# code:

 if (scriptNode.Attributes["for"] != null)
 {
                                {
    if (scriptNode.Attributes["for"] != null)
                                        ctrl = scriptNode.Attributes["for"].Value;

                                    if (scriptNode.Attributes["event"] != null)
                                        evt = scriptNode.Attributes["event"].Value;

                                    if (scriptNode.Attributes["type"] != null)
                                        typ = scriptNode.Attributes["type"].Value;

                                    if (scriptNode.Attributes["language"] != null)
                                        lang = scriptNode.Attributes["language"].Value;
                                    if (scriptNode.InnerHtml != null)
                                        code = scriptNode.InnerHtml;

                                    func_name = ctrl + "_" + evt;
                                    if (ctrl != "window")
                                        new_script = Environment.NewLine + "function " + RemoveBrackets(func_name) + "(el)" + Environment.NewLine;
                                    else
                                        new_script = Environment.NewLine + "function " + AddBrackets(RemoveBrackets(func_name)) + Environment.NewLine;
                                    new_script += "{" + Environment.NewLine;


                new_script += "\r\n" + ReplaceThis(sFile, ctrl, evt, code, "this", "el") + "\r\n" + "}" + "\r\n";


                                    //remove for and event attributes
                                    scriptNode.Attributes["for"].Remove();
                                    scriptNode.Attributes["event"].Remove();

                                    //remove depraciated "language" attribute 
                                    //and replace it with "type" attribute
                                    if (scriptNode.Attributes["language"] != null)
                                        scriptNode.Attributes["language"].Remove();
                                    if (scriptNode.Attributes["type"] == null)
                                        scriptNode.Attributes.Add("type", "text/" + lang);

                                    //replace old javascript with a function code
                //HERE new_script variable contains the correct value but when I check  scriptNode.InnerHtml after assignment, it shows the messed up code.

                                    scriptNode.InnerHtml = new_script;

It is very strange and I can't seem to find a solution.

I have tried using HtmlEncode

scriptNode.InnerHtml = HtmlDocument.HtmlEncode(new_script);

And that produced the correct script, as specified above in second example, but replaced all the < and > with &lt; and &gt; etc.

So the result was:

<script type="text/JScript">
function thisForm_onsubmit(el)
{

var Checked = false
var Counter = 0

for (;Counter &lt; el.choice.length; Counter++)
{
    if (el.choice[Counter].checked)
    {
        Checked = true
        el.action = el.choice[Counter].value
    }
}

if (!Checked)
{
    alert (&quot;Please make a selection&quot;)
    return false
}

}
</script>

I thought of using InnerText instead of InnerHtml, which makes more sense since what I am changing is not really HTML but InnerText property is read-only.

Can anyone shed some light on why this is happening and if there is a workaround?

Upvotes: 2

Views: 951

Answers (1)

har07
har07

Reputation: 89335

The modified script contains special character < which I really suspect caused the problem. < can easily misinterpreted as first character of an opening HTML tag, especially when it is used via InnerHtml property.

Here is one possible workaround. Assume that new_script is a string variable containing the modified Javascript, including the opening and closing tags (<script type="text/JScript"></script>). You can try to load new_script into a new HtmlDocument. Then replace the old script in the 1st HtmlDocument with the new script from the 2nd HtmlDocument instance :

.....
var newDoc = new HtmlDocument();
newDoc.LoadHtml(new_script);
var newScript = newDoc.DocumentNode.SelectSingleNode("//script");
scriptNode.ParentNode.ReplaceChild(newScript, script);

dotnetfiddle demo

Upvotes: 1

Related Questions