Reputation: 1234
I'm having trouble with cleaning user input and properly displaying code.
User data is cleaned with bleach, that converts
< ```a < b```
to
< ```a < b```
Then markdown converts the markdown text to HTML
markdown.markdown(u'<\n ```a < b```')
And the output is
<p><\n <code>a &lt; b</code></p>
I figured out this happens because the first < is considered HTML, but that everything in the code block is escaped since you want it to be displayed, not interpreted.
Any suggestions or other libraries that specifically cleans markdown?
Upvotes: 0
Views: 2065
Reputation: 42537
Bleach is an HTML sanitizer, not a markdown sanitizer. It is understandable if you want to sanitize input from untrusted users on your website. However, you would normally run bleach on the output of markdown (which is HTML) not on the markdown text itself.
sanitized_html = bleach.clean(markdown.markdown(some_text))
Go ahead and pass your example markdown text into Python-Markdown. You will get perfectly acceptable results. In fact, your output (using bleach first) is actually incorrect. Notice that the code now contains &lt;
which would display in your browser as <
rather than <
. The output you really want is:
<p>< <code>a < b</code></p>
and that is exactly what Python-Markdown gives you right out of the box. Python-Markdown's dingus shows both the HTML source and preview for a given input. You might want to play with it to see what I mean.
If you are concerned about users submitting bad markdown that will break things, you may be pleased to know that one of Python-Markdown's stated goals is to be suitable "to use in web server environments (never raise an exception, never write to stdout, etc.)" In other words, bad user input should not crash your server. Sure they could inject malicious html/javascript, but that's what Bleach is for after markdown builds the html from the user input.
One final comment. Yes, I know Python-Markdown has a "safe_mode". However, that is an unfortunately named feature. A more apt name might be "strip_html" or "escape_html" (it can do either). As the primary developer of Python-Markdown I recommend Bleach for sanitizing input from untrusted users.
Upvotes: 4