Julian Medic
Julian Medic

Reputation: 53

Python convert markdown to html fix

This is my actual code :

import bleach
import markdown


html = """
**### The Facebook Campaign will be alligned:**
**### What you get:**
"""


def render_markdown(text):
    if not text:
        return ''

    html = markdown.markdown(text, extensions=[
        'markdown.extensions.sane_lists',
        'markdown.extensions.nl2br',
    ])
    return bleach.clean(html, tags=[
        'p', 'h1', 'h2', 'br', 'h3', 'b', 'strong', 'u', 'i', 'em', 'hr', 'ul', 'ol', 'li', 'blockquote'
    ])


print render_markdown(html)

problem is, same user add more time markdown code like ### What you get: , the conversion results is this :

<p><strong>### The Facebook Campaign will be aligned to your business goal (1 x goal per hour):</strong></p>
<p><strong>### What you get:</strong></p>

how could I prevent this situation? i want return clean html code without rimanend markdown code in text, the perfect output is this :

<p><strong>The Facebook Campaign will be aligned to your business goal (1 x goal per hour):</strong></p>
<p><strong>What you get:</strong></p>

Upvotes: 1

Views: 2374

Answers (1)

Arount
Arount

Reputation: 10403

**### foo**

Is actually well parsed and really means:

<p><strong>### foo</strong></p>

It's not a Python or Markdown issue here, only a user that don't know how to format Markdown..

If you want to clean this you will have to parse user's input - but this is really not a Markdown question here, but a way more generic question that will surely requires some Regex.


  • For <h3>Foo</h3> do ### Foo
  • For <p><strong>Foo</strong></p> do **Foo**.

Edit because of comments

Ok, so you want to fix this specific case, here is how:

import re
string = '**### foo**'

print(re.sub('\*{2}\#+([^*]+)\*{2}', '** \\1 **', string))

Output

** foo **

So, final function:

def render_markdown(text):
    if not text:
        return ''

    text = re.sub('\*{2}\#+([^*]+)\*{2}', '** \\1 **', text)
    html = markdown.markdown(text, extensions=[
        'markdown.extensions.sane_lists',
        'markdown.extensions.nl2br',
    ])
    return bleach.clean(html, tags=[
        'p', 'h1', 'h2', 'br', 'h3', 'b', 'strong', 'u', 'i', 'em', 'hr', 'ul', 'ol', 'li', 'blockquote'
    ])

Upvotes: 2

Related Questions