Aftab
Aftab

Reputation: 21

Extract data from span tag inside div tag

My problem may be trivial because I am new to web scraping. Please see following HTML code:

<div class="price-container clearfix">
<span class="sale-flag-percent">-40%</span>  
<span class="price-box ri"> 
<span class="price "><span data-currency-iso="PKR">Rs.</span> 
<span dir="ltr" data-price="5999">&nbsp;5,999</span>  </span>  
<span class="price -old "><span data-currency-iso="PKR">Rs.</span> 
<span dir="ltr" data-price="9999">&nbsp;9,999</span>  </span> </span>
</div>

So far I am able to get access to outermost div.price-container clearfix. But I am not able to get inner spans and get the price of the product. Any way to gain access to inner span and get prices.

Upvotes: 0

Views: 1624

Answers (1)

t.m.adam
t.m.adam

Reputation: 15376

Given the html in your question it should be quite easy to select the span tags using CSS selectors.
An example,

from bs4 import BeautifulSoup

html = '''
<div class="price-container clearfix">
<span class="sale-flag-percent">-40%</span>  
<span class="price-box ri"> 
<span class="price "><span data-currency-iso="PKR">Rs.</span> 
<span dir="ltr" data-price="5999">&nbsp;5,999</span>  </span>  
<span class="price -old "><span data-currency-iso="PKR">Rs.</span> 
<span dir="ltr" data-price="9999">&nbsp;9,999</span>  </span> </span>
</div>
'''

soup = BeautifulSoup(html, 'html.parser')
tags = soup.select('div.price-container.clearfix span[data-price]')
prices = [i.text.strip() for i in tags]

print(prices)

This expression:

div.price-container.clearfix span[data-price]  

selects all 'span' tags that have a 'data-price' attribute, if they are descendants of a 'div' tag that has 'price-container' and 'clearfix' class attributes.

The result is a list with the text of both span tags. If you want a different selector for each tag, you could use the span.price and span.price.-old parent tags.

new_prices = soup.select('span[class="price "] span[data-price]')
old_prices = soup.select('span[class="price -old "] span[data-price]')

This will result in two lists of tags, one for each price category.

Upvotes: 1

Related Questions