user3541631
user3541631

Reputation: 4008

Cleaning html text, issues with replace

I have an editor that ads:

<p><br></p> or empty `p`, and I want to rplace or remove them.

I use:

  value = value.replace('<p><br></p>', '<br>').replace('<p></p>','').strip('<br>')

The problem is that sometimes remove everything, an in all cases for the first paragraph I always get: p>(removes the first chracter in tag).

Upvotes: 0

Views: 74

Answers (2)

Remy J
Remy J

Reputation: 729

Base on you solution, why not just do ?

value = value.replace("<p>", '').replace("</p>", '')

Shouldn't that be enough ?
All <p> and </p> would get replaced and the rest of string will remain untouched.

For value = "<p><br></p>" you will get "<br>".
For value = "<p></p>" you will get ''.
For value = "<p></p>oueo<p>54<br>65</p>eoue<p></p>" you will get "'oueo54<br>65eoue'".

Upvotes: 1

Odysseas
Odysseas

Reputation: 1060

Your error is in how you use the strip method, which removes any leading or trailing sequence of the '<br>' characters. So <b>hello</b> would be stripped to hello</, for example.

If you want to remove any <br> in the beginning and in the end of the value string, you can do it like so:

if value.startswith('<br>'):
    value = value[4:]
if value.endswith('<br>'):
    value = value[:-4]

Upvotes: 1

Related Questions