Removing specific html tags with python

Question

I have some HTML tables inside of an HTML cell, like so:

miniTable='
               %s
           ' % ( bgcolor, fontColor, floatNumber)

html += '' + miniTable + ''

Is there a way to remove the HTML tags that pertain to this minitable, and only these html tags?
I would like to somehow remove these tags:

and

to get this:

floatNumber

where floatNumber is the string representation of a floating point number. I don't want any of the other HTML tags to be modified in any way. I was thinking of using string.replace or regex, but I'm stumped.

fedosov · Accepted Answer

If you can't install and use Beautiful Soup (otherwise BS is preferred, as @otto-allmendinger proposed):

import re
s = '1.23'
result = float(re.sub(r"<.?table[^>]*>|<.?t[rd]>|]+>|<.?b>", "", s))

Removing specific html tags with python

Answers (2)

Related Questions