Reputation: 589
I am currently working on this snippet
@register_snippet
class Foo(models.Model):
body = RichTextField()
Somehow I decided that I need to define __str__
so that I can distinguish between multiple snippets. I believe that using body
contents as the string representation might be a good idea. Although, body can grow incredibly large, so the first 20 characters should do.
Thinking this task is a piece of cake, I wrote this code:
@register_snippet
class Foo(models.Model):
body = RichTextField()
def __str__(self):
return self.body[:20]
... which resulted in ugly representations such as <p>Quequeueueueue?</
and <p>FOO<br/></p>
.
I checked the database and found out that the markup is stored in the database, which seemed pretty obvious. Although, I'm still wondering.
What is the way to remove HTML markup without using CPU-heavy HTML parsers and parsing the whole body
?
I believe regular expressions can remove some tags, but how do I handle situations like </a
, </blockqu
, <
, </
?
Upvotes: 3
Views: 1208
Reputation: 688
As suggested in a comment by @Ian Price, if you can trust the content you're working on, a regex works fine. For example:
import re
...
def __str__(self):
remove_tags = re.compile('<.*?>')
return re.sub(remove_tags, '', self.body)
Upvotes: 0
Reputation: 7616
There is no built-in way to safely escape html. From the docs-
Deprecated since version 1.8:
removetags
cannot guarantee HTML safe output and has been deprecated due to security concerns. Consider using bleach instead.
Bleach is an html sanitizing library from Mozilla. It may make the most sense to create a method on your snippet class that returns a bleached string, removing all tags.
In addition, you can use the built-in template tags truncatewords
, truncatechars
, etc to limit how many characters/words are shown upon template rendering.
Upvotes: 2