Muhammad Faisal
Muhammad Faisal

Reputation: 131

scraping data from next span within same h1 tag in beautifulsoup

Hi i am trying to scrape subcategory

subcat = soup.find(class_='bread-block-wrap').find(class_='breadcrumb-keyword-bg').find(class_='breadcrumb-keyword list-responsive-container').find(class_='ui-breadcrumb').find('h1')

and this is the output

<h1>
<a href="//www.aliexpress.com/category/509/cellphones-telecommunications.html" title="Cellphones &amp; Telecommunications"> Cellphones &amp; Telecommunications</a>
<span class="divider">&gt;</span> <span> Mobile Phones</span>
</h1>

so now there is 2 span tag number 1 is

<span class="divider">&gt;</span>

and 2nd one is

<span> Mobile Phones</span>

and i want to scrape this text in 2nd span tag, please can someone help

Upvotes: 0

Views: 838

Answers (3)

Razzaghnoori
Razzaghnoori

Reputation: 374

Another solution would be using CSS selectors which lets you get rid of cascading over and over again. In your case this:

results = soup.select(".bread-block-wrap .breadcrumb-keyword-bg .breadcrumb-keyword.list-responsive-container .ui-breadcrumb h1 span")

is going to return the two span tags in a list. Then, you can simply just use the second one.

You, of course, have lots of other useful tools to work with when you choose CSS selectors. Just find a CSS selector cheatsheet and enjoy.

Upvotes: 0

QHarr
QHarr

Reputation: 84465

You can use css nth-of-type selector

h1 span:nth-of-type(2)

i.e.

items = soup.select("h1 span:nth-of-type(2)")

Then iterate list.

If only one match possible then simply:

item = soup.select_one("h1 span:nth-of-type(2)")
print(item.text.strip())

Upvotes: 1

Bitto
Bitto

Reputation: 8215

You can use find_all() function to get all the span tags in a list and then use .text attribute to get the text.

subcat.find_all('span')[1].text

Should output

 Mobile Phones

Demo

from bs4 import BeautifulSoup
html="""
<h1>
<a href="//www.aliexpress.com/category/509/cellphones-telecommunications.html" title="Cellphones &amp; Telecommunications"> Cellphones &amp; Telecommunications</a>
<span class="divider">&gt;</span> <span> Mobile Phones</span>
</h1>
"""
soup=BeautifulSoup(html,'html.parser')
h1=soup.find('h1')
print(h1.find_all('span')[1].text.strip())

Output

Mobile Phones

Upvotes: 1

Related Questions