Reputation: 157
Hello guys i am using the html-sanitizer python package but im unable to enable img tags as it is disabled by default
i tried edited the sanitizer.py(shown below) in site-packages but still no luck.
DEFAULT_SETTINGS = {
"tags": {
"a",
"h1",
"h2",
"h3",
"strong",
"em",
"p",
"ul",
"ol",
"li",
"br",
"sub",
"sup",
"hr",
"img"
},
"attributes": {"a": ("href", "name", "target", "title", "id", "rel"),"img": ("src")},
"empty": {"hr", "a", "br"},
"separate": {"a", "p", "li"},
"whitespace": {"br"},
"add_nofollow": False,
"autolink": False,
"sanitize_href": sanitize_href,
"element_preprocessors": [
# convert span elements into em/strong if a matching style rule
# has been found. strong has precedence, strong & em at the same
# time is not supported
bold_span_to_strong,
italic_span_to_em,
tag_replacer("b", "strong"),
tag_replacer("i", "em"),
tag_replacer("form", "p"),
target_blank_noopener,
],
"element_postprocessors": [],
}
Can somebody help me out. i want the img tag with only src attribute
Upvotes: 3
Views: 1295
Reputation: 4551
Sanitizer won't use DEFAULT_SETTINGS
if different settings are provided when initializing the Sanitizer()
on the settings={}
arguments. That might be going on here, but I suspect it's the empty
attribute that's wrong.
sanitizer
will also remove tags which are empty so, for example <em></em>
is cleaned to ''
. That's nice, but the <img .../>
also results in an empty tag (that is, no children), so sanitizer cleans it.
You need to add img
to the settings['empty']
set, along with the current {"hr", "a", "br"}
.
While you're at it, don't edit DEFAULT, but instead define your own (using copy of DEFAULT). For example:
# Make a copy
my_settings = dict(html_sanitizer.sanitizer.DEFAULT_SETTINGS)
# Add your changes
my_settings['tags'].add('img')
my_settings['empty'].add('img')
my_settings['attributes'].update({'img': ('src', )})
# Use it
s = html_sanitizer.Sanitizer(settings=my_settings)
s.sanitize('<em><img src="/index.html"/></em>')
Upvotes: 6