Reputation: 31

Strip BBCode from string

I'm trying to find a way to strip BBCode from a string. The modules that I've found (BBCode and Post Markup) seem to only translate them to HTML rather than just remove the BBCode and return a clean string. If I'm missing something and one of those actually does what I'm asking I'd love some direction on it :)

Otherwise, are there any ways to strip BB Code from a string and return plain text?

Upvotes: 1

Answers (2)

MattDMo

Reputation: 102862

Your answer is actually within the bbcode module. Unfortunately, the relevant method is not in the documentation, but if you search through the code it's there: Parser.strip(). For example:

import bbcode

parser = bbcode.Parser()
code = "[code]a = [1, 2, 3, 4, 5][/code]"
plain_txt = parser.strip(code)
print(plain_txt)

'a = [1, 2, 3, 4, 5]'

Unfortunately, both Robᵩ's regex-based answer and postmarkup suffer from the same problem - the inability to differentiate between BBCode ([list][*]Item 1[*]Item 2[/list], [color=red]I hate color-blind people![/color], etc.) and the embedded code example I used above (they both return a =), or a line like

I'm feeling sad :[ But, eating ice cream cheers me up! :]

which simply returns

I'm feeling sad :

This is possible because bbcode tokenizes the string first, searching for valid BBCode tokens, and identifying the rest as just parts of the overall text. Parser.strip() then just throws the BBCode tokens away and reassembles the text, while the formatting methods turn those tokens into XHTML markup, and splice in the rest where appropriate.

Upvotes: 3

Robᵩ

Reputation: 168636

Depending upon your needs, this might be sufficient:

#UNTESTED
import re
with open("some_input_file.txt") as input_file:
    for s in input_file:
        s = re.sub('\[.*?]','',s)
        print s

Upvotes: 0

Strip BBCode from string

Answers (2)

Related Questions