Aftab
Aftab

Reputation: 21

Python BeautifulSoup get data from span tag

Please have a look at following html code:

<section class = "products">
<span class="price-box ri"> 
<span class="price ">
<span data-currency-iso="PKR">Rs.</span> 
<span dir="ltr" data-price="5999">&nbsp;5,999</span> </span>  
<span class="price -old ">
<span data-currency-iso="PKR">Rs.</span> 
<span dir="ltr" data-price="9999">&nbsp;9,999</span>  </span> 
</span>
</section>

In the products section, there are 40 such code blocks which contain prices for items. Not all products have old prices but all products have current price. But when I try to access item prices it also gives me old prices, so I get total 69 item prices which should be 40. I am missing something but since I am new to this field I couldn't figure it out. Please someone could help. Thanks.

Upvotes: 1

Views: 671

Answers (1)

Keyur Potdar
Keyur Potdar

Reputation: 7248

You can use a CSS selector to match the exact class name. For example, here, you can use span[class="price "] as the selector, and it won't match the old prices.

html = '''
<section class = "products">
    <span class="price-box ri"> 
        <span class="price ">
            <span data-currency-iso="PKR">Rs.</span> 
            <span dir="ltr" data-price="5999">&nbsp;5,999</span>
        </span>  
        <span class="price -old ">
            <span data-currency-iso="PKR">Rs.</span> 
            <span dir="ltr" data-price="9999">&nbsp;9,999</span>
        </span> 
    </span>
</section>'''
soup = BeautifulSoup(html, 'lxml')

for price in soup.select('span[class="price "]'):
    print(price.get_text(' ', strip=True))

Output:

Rs. 5,999

Or, you could also use a custom function to match the class name.

for price in soup.find_all('span', class_=lambda c: c == 'price '):
    print(price.get_text(' ', strip=True))

Upvotes: 2

Related Questions