Reputation: 1115
I am trying to get the highlighted text "frei ab 01.05.2017" below. The problem is however, that the class "section_content iw_right" exists 19 times on that website. I would do a find_all and return only the 11th element from there, however on some sites that I want to scrape there are a different number of that class, so I might not always catch the right one. Any ideas? Thanks!
Upvotes: 1
Views: 1853
Reputation: 474071
One way to get to the desired element is using the preceding label - locate the span
element with "Erdgeschoss" text and find the next strong
sibling:
label = soup.find("span", text="Erdgeschoss")
print(label.find_next_sibling("strong").get_text())
Upvotes: 1
Reputation: 2233
You can use lxml which is order of magnitude faster than BeautifulSoup.
The following code can help you in achieving the desired result.
from lxml import html
html_string = """
<div class="clear">
<div class="section_content iw_right">
<p>
<span>
</span>
<strong>hello</strong>
<br>
<strong>gen</strong>
</p>
</div>
</div>
<div class="clear">
<p>
<span>
</span>
<strong>hello1</strong>
<br>
<strong>gen1</strong>
</p>
</div>
"""
root = html.fromstring(html_string)
r_xp = [elem.xpath('.//p/strong/text()')[0] for elem in root.xpath('//div[@class="clear"]')]
print(r_xp)
Note the absense of div with class "section_content iw_right"
from second div
in the example html_string
.
The above code will result in :
['hello', 'hello1']
Upvotes: 1