Python HTMLParser Detecting the End of Data

Question

I am using the HTMLParser library of Python 2.7 to process and extract some information from an HTML content which was fetched from a remote url. I did not quite understand how to know or catch the exact moment when the parser instance finishes parsing the HTML data.

The basic implementation of my parser class looks like this:

class MyParser(HTMLParser.HTMLParser):
    def __init__(self, url):
        self.url = url
        self.users = set()

    def start(self):
        self.reset()
        response = urllib3.PoolManager().request('GET', self.url)
        if not str(response.status).startswith('2'):
            raise urllib3.HTTPError('HTTP error here..')
        self.feed(response.data.decode('utf-8'))

    def handle_starttag(self, tag, attrs):
        if tag == 'div':
            attrs = dict(attrs)
            if attrs.get('class') == 'js_userPictureOuterOnRide':
                user = attrs.get("data-name")
                if user:
                    self.users.add(user)

    def reset(self):
        HTMLParser.HTMLParser.reset(self)
        self.users.clear()

My question is, how can I detect that parsing process is finished?

Thanks.

Python HTMLParser Detecting the End of Data

Answers (1)

Related Questions