Ozgur Vatansever
Ozgur Vatansever

Reputation: 52143

Python HTMLParser Detecting the End of Data

I am using the HTMLParser library of Python 2.7 to process and extract some information from an HTML content which was fetched from a remote url. I did not quite understand how to know or catch the exact moment when the parser instance finishes parsing the HTML data.

The basic implementation of my parser class looks like this:

class MyParser(HTMLParser.HTMLParser):
    def __init__(self, url):
        self.url = url
        self.users = set()

    def start(self):
        self.reset()
        response = urllib3.PoolManager().request('GET', self.url)
        if not str(response.status).startswith('2'):
            raise urllib3.HTTPError('HTTP error here..')
        self.feed(response.data.decode('utf-8'))

    def handle_starttag(self, tag, attrs):
        if tag == 'div':
            attrs = dict(attrs)
            if attrs.get('class') == 'js_userPictureOuterOnRide':
                user = attrs.get("data-name")
                if user:
                    self.users.add(user)

    def reset(self):
        HTMLParser.HTMLParser.reset(self)
        self.users.clear()

My question is, how can I detect that parsing process is finished?

Thanks.

Upvotes: 0

Views: 394

Answers (1)

gog
gog

Reputation: 11347

HTMLParser is synchronous, that is, once it returns from feed, all data so far has been parsed and all callbacks called.

    self.feed(response.data.decode('utf-8'))
    print 'ready!'

(if I misunderstood your question, please let me know).

Upvotes: 1

Related Questions