daGrevis
daGrevis

Reputation: 21333

Escape from XSS vulnerability maintaining Markdown syntax?

I'm planning to use Markdown syntax in my web page. I will keep users input (raw, no escaping or whatever) in the database and then, as usual, print out and escape on-the-fly with htmlspecialchars().

This is how it could look:

echo markdown(htmlspecialchars($content));

By doing that I'm protected from XSS vulnerabilities and Markdown works. Or, at least, kinda work.

The problem is, lets say, > syntax (there are other cases too, I think).

In short, to quote you do something like this:

> This is my quote.

After escaping and parsing to Markdown I get this:

> This is my quote.

Naturally, Markdown parser do not recognize > as “quote's symbol” and it does not work! :(

I came here to ask for solutions to this problem. One idea was to:

First, parse to Markdown, — then with HTML Purifier remove “bad parts”.

What do you think about it? Would it actually work?

I'm sure that someone had have the same situation and the one can help me too. :)

Upvotes: 3

Views: 1533

Answers (2)

D.W.
D.W.

Reputation: 3604

The approach you are using is not secure. Consider, for instance, this example: "[clickme](javascript:alert%28%22xss%22%29)". In general, don't escape the input to the Markdown processor. Instead, use Markdown properly in a safe mode, or apply HTML Purifier or another HTML sanitizer to the output of the Markdown processor.

I've written elsewhere about how to use Markdown securely. See the link for details about how to use it safely, but the short version is: it is important to use the latest version, to set safe_mode, and to set enable_attributes=False.

Upvotes: 1

balpha
balpha

Reputation: 50908

Yes, a certain website has that exact same situation. At the time I'm writing this, you have 1664 reputation on that website :)

On Stack Overflow, we do exactly what you describe (except that we don't render on the fly). The user-entered Markdown source is converted to plain HTML, and the result is then sanitized using a whitelist approach (JavaScript version, C# version part 1, part 2).

That's the same approach that HTML Purifier takes (having never used it, I can't speak for details though).

Upvotes: 4

Related Questions