Reputation: 39
I have a page that have some same classes, like five <div class='price'></div>
. I need to get the information from the certain class using HTMLParser()
. Needed class is in the bottom of that list and have an upper class in html tree. Problem the my code shows me the first div
tag, but I need another. How do I get this?
I need to extract "1015" from the page, but mu code shows 150. Page HTML:
<div class='price'>150</div>
<div class='form-row'></div>
<input type="hidden" value="15121" name="add-to-cart">
<div class='price'>
::before
"1015"
</div>
My code:
class ParserLyku(HTMLParser):
price_is_found = is_price_field = None
_product_info = {}
_all_prices = []
def handle_starttag(self, tag, attrs):
if (not self.price_is_found and
'class' not in self._product_info and
tag == 'div'):
attrs = dict(attrs)
if attrs.get('class') == 'price':
self.is_price_field = True
def handle_data(self, data):
if (not self.price_is_found and
self.is_price_field and
'class' not in self._product_info):
self._product_info['price'] = data
self.price_is_found = True
Upvotes: 1
Views: 31
Reputation: 195573
There are many ways to do that, one possible solution is to count how many <div class="price">
we've encountered so far (for example, we skip the first price):
from html.parser import HTMLParser
html_doc = '''\
<div class='price'>150</div>
<div class='form-row'></div>
<input type="hidden" value="15121" name="add-to-cart">
<div class='price'>
"1015"
</div>
'''
class ParserLyku(HTMLParser):
to_find = ('div', ('class', 'price'))
def __init__(self):
HTMLParser.__init__(self)
self.__opened_tags = []
self.__counter = 0
self.prices = []
def handle_starttag(self, tag, attrs):
if (tag, *attrs) == ParserLyku.to_find:
self.__counter += 1
self.__opened_tags.append((tag, *attrs))
def handle_endtag(self, tag):
self.__opened_tags.pop()
def handle_data(self, data):
if self.__opened_tags and self.__opened_tags[-1] == ParserLyku.to_find and self.__counter > 1:
self.prices.append(data.strip())
parser = ParserLyku()
parser.feed(html_doc)
print(parser.prices)
Prints:
['"1015"']
Upvotes: 1