digiadit
digiadit

Reputation: 157

Python html-sanitizer allow img tag

Hello guys i am using the html-sanitizer python package but im unable to enable img tags as it is disabled by default

i tried edited the sanitizer.py(shown below) in site-packages but still no luck.

DEFAULT_SETTINGS = {
    "tags": {
        "a",
        "h1",
        "h2",
        "h3",
        "strong",
        "em",
        "p",
        "ul",
        "ol",
        "li",
        "br",
        "sub",
        "sup",
        "hr",
        "img"
    },
    "attributes": {"a": ("href", "name", "target", "title", "id", "rel"),"img": ("src")},
    "empty": {"hr", "a", "br"},
    "separate": {"a", "p", "li"},
    "whitespace": {"br"},
    "add_nofollow": False,
    "autolink": False,
    "sanitize_href": sanitize_href,
    "element_preprocessors": [
        # convert span elements into em/strong if a matching style rule
        # has been found. strong has precedence, strong & em at the same
        # time is not supported
        bold_span_to_strong,
        italic_span_to_em,
        tag_replacer("b", "strong"),
        tag_replacer("i", "em"),
        tag_replacer("form", "p"),
        target_blank_noopener,
    ],
    "element_postprocessors": [],
}

Can somebody help me out. i want the img tag with only src attribute

Upvotes: 3

Views: 1295

Answers (1)

pbuck
pbuck

Reputation: 4551

Sanitizer won't use DEFAULT_SETTINGS if different settings are provided when initializing the Sanitizer() on the settings={} arguments. That might be going on here, but I suspect it's the empty attribute that's wrong.

sanitizer will also remove tags which are empty so, for example <em></em> is cleaned to ''. That's nice, but the <img .../> also results in an empty tag (that is, no children), so sanitizer cleans it.

You need to add img to the settings['empty'] set, along with the current {"hr", "a", "br"}.

While you're at it, don't edit DEFAULT, but instead define your own (using copy of DEFAULT). For example:

# Make a copy
my_settings = dict(html_sanitizer.sanitizer.DEFAULT_SETTINGS)

# Add your changes
my_settings['tags'].add('img')
my_settings['empty'].add('img')
my_settings['attributes'].update({'img': ('src', )})

# Use it
s = html_sanitizer.Sanitizer(settings=my_settings)
s.sanitize('<em><img src="/index.html"/></em>')

Upvotes: 6

Related Questions