Reputation: 52143
I am using the HTMLParser library of Python 2.7 to process and extract some information from an HTML content which was fetched from a remote url. I did not quite understand how to know or catch the exact moment when the parser instance finishes parsing the HTML data.
The basic implementation of my parser class looks like this:
class MyParser(HTMLParser.HTMLParser):
def __init__(self, url):
self.url = url
self.users = set()
def start(self):
self.reset()
response = urllib3.PoolManager().request('GET', self.url)
if not str(response.status).startswith('2'):
raise urllib3.HTTPError('HTTP error here..')
self.feed(response.data.decode('utf-8'))
def handle_starttag(self, tag, attrs):
if tag == 'div':
attrs = dict(attrs)
if attrs.get('class') == 'js_userPictureOuterOnRide':
user = attrs.get("data-name")
if user:
self.users.add(user)
def reset(self):
HTMLParser.HTMLParser.reset(self)
self.users.clear()
My question is, how can I detect that parsing process is finished?
Thanks.
Upvotes: 0
Views: 394
Reputation: 11347
HTMLParser
is synchronous, that is, once it returns from feed
, all data so far has been parsed and all callbacks called.
self.feed(response.data.decode('utf-8'))
print 'ready!'
(if I misunderstood your question, please let me know).
Upvotes: 1