LP13
LP13

Reputation: 34149

How to safely store html and js in the database?

We have an ASP.NET Core 6 MVC application that stores html and JS script in a SQL database and then on page load, it renders that html template and execute js script (yes, that's the application requirement to allow user to create different HTML templates).

We use ace editor for the html and js editing.

HtmlTemplate and Js are two hidden fields, ace editor sets the value to these two fields when anything changes.

<script type="text/javascript">
  $(function () {
     //HTML
     var _htmleditor = ace.edit("editor");
     var _template = $("#HtmlTemplate");
     configureHtmlTemplate();

     function configureHtmlTemplate() {
         //some more configuration goes here
        _htmleditor .getSession().on("change", function () {
            _template.val(_htmleditor .getSession().getValue());
        });

        _htmleditor.getSession().setValue(_template.val());
     }

    //JS
    var _jsEditor = ace.edit("jsEditor");
    var _js = $("#Js");
    configureJsTemplate();

    function configureJsTemplate() {    
         //some more configuration goes here       
        _jsEditor.getSession().on("change", function () {
            _js.val(_jsEditor.getSession().getValue());
        });
        _jsEditor.getSession().setValue(_js.val());
    }
 })

Server side SaveTemplate action method saves html and Js to the database and then some other user action invoke Render method:

    [HttpPost]
    [Route("items/{id}/templates")]
    public async Task<ActionResult> SaveTemplate([FromRoute(Name = "id")] int itemID, [FromForm] EditTemplateModel model)
    {
        await _templateService.SaveTemplate(new ItemTemplate()
        {
            Id = model.Id,
            HtmlTemplate = model.HtmlTemplate,
            Js = model.Js
        }
        
        return View(model);
    }
            
    [HttpGet]
    [Route("items/{id}/render")]
    public async Task<IActionResult> Render([FromRoute(Name = "id")] int itemID)
    {
        var template = await _templateService.GetByID(itemID)

        var model = new RenderModel()
        {
            ItemID = itemID,
            Html = template.HtmlTemplate,                
            Js = template.JS
        };

        return View(model);
    }
    

Render.cshtml:

    @model RenderModel
    
    <form method="post" id="renderForm" asp-action="SaveStuff" asp-controller="ItemTemplates" asp-route-id="@Model.ItemID">
        @Html.Raw(Model.Html)           
        <button  id="btnTest" class="btn btn-primary mt-3" type="submit">Submit</button>
    </form>
    
    <script type="text/javascript">    
        @Html.Raw(Model.Js)
    </script>

Rightfully so, Checkmarks reports this as venerable to XSS attack since we are using @Html.Raw()

From Microsoft Prevent XSS in Asp.NET Core

The Razor engine used in MVC automatically encodes all output sourced from variables, unless you work really hard to prevent it doing so.

But in my application, I have to render stored html in the browser so I am using @Html.Raw()

The general accepted practice is that encoding takes place at the point of output and encoded values should never be stored in a database.

But if I encode the template before outputting, then the following code would not produce expected result. Instead of showing bold text it will render as <b>Foo Bar</b> which is not expected.

@{
    var htmlStoredInDB = "<b>Foo Bar</b>";
    var untrustedInput = System.Text.Encodings.Web.HtmlEncoder.Default.Encode(htmlStoredInDB);
}

@Html.Raw(untrustedInput)

Whats the solution here? Is there utility available in .NET 6 to sanitize html and js before rendering? or any other better option?

Update 1

I am thinking of two-step approach here(I am still open for any other suggestions)

  1. Sanitize the HTML before saving into the database, to remove any malicious code.

  2. In step 1, the sanitization will only work for HTML template not for JS template. It is difficult to distinguish between bad JS vs good JS. A sandbox technique can isolate the HTML and JS content from main window and minimize the blast radius. Sandboxing may require more work & testing. Our existing template may need to refactor. I Noticed JSFiddle use the sandboxing approach

Upvotes: 1

Views: 1052

Answers (2)

Developer_16
Developer_16

Reputation: 79

Using @Html.Raw() to render the HTML and JS code from DB is considered unsafe as it leaves you application open to XSS (cross-site scripting) attacks.

To avoid these kind of risks, we can use a library that can sanitize our HTML and JS code before rendering into the browser.

Recommended library: Microsoft.AspNetCore.WebUtilities.HtmlEncoder.

Simple Example of how to use the HTMLEncoder :

@{
    var htmlStoredInDB = "<b>Foo Bar</b>";
    var sanitizedHtml = Microsoft.AspNetCore.WebUtilities.HtmlEncoder.Default.Encode(htmlStoredInDB);
    var jsStoredInDB = "alert('Hello, world!');";
    var sanitizedJs = Microsoft.AspNetCore.WebUtilities.JavaScriptEncoder.Default.Encode(jsStoredInDB);
}

@Html.Raw(sanitizedHtml)
<script>
    @Html.Raw(sanitizedJs)
</script>

Note: The above code is used only to protect from xss attack, it would generate it as plain text, it wont return the 'Foo Bar' in bold instead it will return "<b>Foo Bar</b>"

In case if you want to avoid from xss attack and also render the html with the tags applied. You should Encode and sanitize your HTML code.

libraries used (supported by .Net 6) :

  • System.Text.Encodings.Web;(used for Encoding using)

  • HtmlAgilityPack;(used for sanitizing)

Example:

using System.Text.Encodings.Web;
using HtmlAgilityPack;

public static class HtmlUtility
{
    public static string EncodeAndSanitizeHtml(string inputHtml)
    {
        var allowedTags = new[] { "b", "i", "u" };
        var doc = SanitizeHtml(inputHtml, allowedTags);
        var writer = new System.IO.StringWriter();
        doc.Save(writer);
        return HtmlEncoder.Default.Encode(writer.ToString());
    }
    
    private static HtmlDocument SanitizeHtml(string html, string[] allowedTags)
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        foreach (var node in doc.DocumentNode.DescendantsAndSelf())
        {
            if (!allowedTags.Contains(node.Name))
            {
                node.Remove();
            }
        }

        return doc;
    }
}

To call the above method:

string inputHtml = "<html><head><title>Page title</title></head><body><h1>Hello world!</h1><b>happy</b></body></html>";
string outputHtml = HtmlUtility.EncodeAndSanitizeHtml(inputHtml);
Console.WriteLine(outputHtml); // Output: &lt;b&gt;happy&lt;/b&gt;

In the output the tag is encoded as above, but when this is rendered in HTML Context,It will display text in bold.

Note: The above steps are not suitable for encoding or sanitizing JavaScript code. For JavaScript, it is recommended to use a specific JavaScript sanitizer or security tool, like the DOMPurify library, which is specifically designed to sanitize and protect JavaScript code from XSS attacks.

Upvotes: -1

Jason Pan
Jason Pan

Reputation: 22082

I have searched the issue via the key words --<asp.net core Cleans HTML to avoid XSS attacks>.

And I found the official doc has HtmlEncoder andJavaScriptEncoder.

Also I found this excellent github repo(HtmlSanitizer).

Upvotes: 0

Related Questions