Beginner
Beginner

Reputation: 39

Get a certain tag info with HTMLParser()

I have a page that have some same classes, like five <div class='price'></div>. I need to get the information from the certain class using HTMLParser(). Needed class is in the bottom of that list and have an upper class in html tree. Problem the my code shows me the first div tag, but I need another. How do I get this?

I need to extract "1015" from the page, but mu code shows 150. Page HTML:

<div class='price'>150</div>
    <div class='form-row'></div>
        <input type="hidden" value="15121" name="add-to-cart">
            <div class='price'>
                ::before
                "1015"
            </div>

My code:

class ParserLyku(HTMLParser):

    price_is_found = is_price_field = None
    _product_info = {}
    _all_prices = []

    def handle_starttag(self, tag, attrs):
        if (not self.price_is_found and
                'class' not in self._product_info and
                tag == 'div'):
            attrs = dict(attrs)
            if attrs.get('class') == 'price':
                self.is_price_field = True

    def handle_data(self, data):
        if (not self.price_is_found and
                self.is_price_field and
                'class' not in self._product_info):
                self._product_info['price'] = data
                self.price_is_found = True

Upvotes: 1

Views: 31

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195573

There are many ways to do that, one possible solution is to count how many <div class="price"> we've encountered so far (for example, we skip the first price):

from html.parser import HTMLParser

html_doc = '''\
<div class='price'>150</div>
    <div class='form-row'></div>
    <input type="hidden" value="15121" name="add-to-cart">

        <div class='price'>
            "1015"
        </div>
'''


class ParserLyku(HTMLParser):
    to_find = ('div', ('class', 'price'))

    def __init__(self):
        HTMLParser.__init__(self)
        self.__opened_tags = []
        self.__counter = 0
        self.prices = []

    def handle_starttag(self, tag, attrs):
        if (tag, *attrs) == ParserLyku.to_find:
            self.__counter += 1

        self.__opened_tags.append((tag, *attrs))

    def handle_endtag(self, tag):
        self.__opened_tags.pop()

    def handle_data(self, data):
        if self.__opened_tags and self.__opened_tags[-1] == ParserLyku.to_find and self.__counter > 1:
            self.prices.append(data.strip())

parser = ParserLyku()
parser.feed(html_doc)

print(parser.prices)

Prints:

['"1015"']

Upvotes: 1

Related Questions