user1447679
user1447679

Reputation: 3240

No luck with HTML decoding based on safe HTML tags (vb.net or c#)

I've spent quite a bit of time trying to figure out the best way to handle this. I'm HTML encoding rich text from untrusted user input prior to storing it in the database.

I've bounce back and forth between multiple discussions, and it seems the safest method is to:

HTML encode absolutely everything, and only decode based on a white/safe list prior to sending it back to the client.

However, I'm also seeing strong suggestions for using http://htmlagilitypack.codeplex.com/

This compares user input against your safe/white list.

I've read:

C# HtmlDecode Specific tags only

https://eksith.wordpress.com/2011/06/14/whitelist-santize-htmlagilitypack/

And really, about 10 other posts and have become frustrated because now I can't figure out the best way to handle this.

I've tried using regular expressions to use regex replace methods:

For Each tag In AcceptableTags.Split(CChar("|")).ToList()
    pattern = "<" + "\s*/?\s*" + tag + ".*?" + ">"
    Regex = New Regex(pattern)
    input = Regex.Replace(input, pattern)
Next

This doesn't seems to work well at all.

Is there someone out there who has a tried and true method with an example implementation they wouldn't mind sharing? I'll take c# or vb.net.

Upvotes: 1

Views: 270

Answers (1)

LoztInSpace
LoztInSpace

Reputation: 5697

Depends on your data. Whitelist on the initial validation is fine if, for example, you're trying to avoid HTML in a phone number. On the other hand, if you can't be specific about what's in and what's out then just leave it "raw".

It's highly unlikely that storing encoded data in a database is the correct thing to do.

Any system of even marginal complexity will have non-HTML clients it will have to serve data to. When you do have an HTML client, you need to escape the output appropriate to HTML. Same for XML. Similarly, if you decide today you like JSON better, you'll encode to that. CSV? No problem - put quotes around your values (and escape any quotes) in case they have commas. Use parameters when doing SQL. Get the idea?

TL;DR;

  • Whitelist input if you can
  • Saving specifically encoded data is probably wrong

  • Always, always, always escape appropriate to your output

  • Never try and do your own escaping - always use a trusted library. You will never do a good enough job.

Upvotes: 2

Related Questions