Reputation: 21
Please have a look at following html code:
<section class = "products">
<span class="price-box ri">
<span class="price ">
<span data-currency-iso="PKR">Rs.</span>
<span dir="ltr" data-price="5999"> 5,999</span> </span>
<span class="price -old ">
<span data-currency-iso="PKR">Rs.</span>
<span dir="ltr" data-price="9999"> 9,999</span> </span>
</span>
</section>
In the products section, there are 40 such code blocks which contain prices for items. Not all products have old prices but all products have current price. But when I try to access item prices it also gives me old prices, so I get total 69 item prices which should be 40. I am missing something but since I am new to this field I couldn't figure it out. Please someone could help. Thanks.
Upvotes: 1
Views: 671
Reputation: 7248
You can use a CSS selector to match the exact class name. For example, here, you can use span[class="price "]
as the selector, and it won't match the old prices.
html = '''
<section class = "products">
<span class="price-box ri">
<span class="price ">
<span data-currency-iso="PKR">Rs.</span>
<span dir="ltr" data-price="5999"> 5,999</span>
</span>
<span class="price -old ">
<span data-currency-iso="PKR">Rs.</span>
<span dir="ltr" data-price="9999"> 9,999</span>
</span>
</span>
</section>'''
soup = BeautifulSoup(html, 'lxml')
for price in soup.select('span[class="price "]'):
print(price.get_text(' ', strip=True))
Output:
Rs. 5,999
Or, you could also use a custom function to match the class name.
for price in soup.find_all('span', class_=lambda c: c == 'price '):
print(price.get_text(' ', strip=True))
Upvotes: 2