Reputation: 470
<tr>
<td style="color: #0000FF;text-align: center"><p>Sam<br/>John<br/></p></td>
</tr>
I am using the python HTMLParser module to extract the values Sam and John from the below html snippet, but the handle_data function is capturing only Sam and not John.
How I can get both Sam and John?
Upvotes: 2
Views: 1736
Reputation: 473863
You can have an instance-level variable that would have True
/False
values. Set it to True
if p
tag started, False
if p
tag ended. When the value is True
, get the data in the handle_data()
method:
from HTMLParser import HTMLParser
data = """
<tr>
<td style="color: #0000FF;text-align: center"><p>Sam<br/>John<br/></p></td>
</tr>
"""
class Parser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.recording = False
def handle_starttag(self, tag, attrs):
if tag == 'p':
self.recording = True
def handle_endtag(self, tag):
if tag == 'p':
self.recording = False
def handle_data(self, data):
if self.recording:
print data
parser = Parser()
parser.feed(data)
Prints:
Sam
John
Upvotes: 4