Reputation: 2432
I try to get some specific text using BeautifulSoup but couldn't figure it out.
All I need is the numbers with the "THIS TEXT" block (56789
), not "SOME TEXT"...
Can someone point what's wrong with my code?
from bs4 import BeautifulSoup
def foo():
response = """
<div class="data_content_blog">
<div class="data_content">
<h5 class="large"> SOME TEXT </h5>
<p class="large some-text">12345</p>
</div>
</div>
<div class="data_content_blog">
<div class="data_content">
<h5 class="large"> SOME TEXT </h5>
<p class="large some-text">34567</p>
</div>
</div>
<div class="data_content_blog">
<div class="data_content">
<h5 class="large"> THIS TEXT </h5>
<p class="large this-text">56789</p>
</div>
</div>
"""
soup = BeautifulSoup(response, features="html.parser")
soup_1 = soup.find_all("div", {"class": "data_content"})
for s_1 in soup_1:
s_2 = s_1.find("p").attrs["class"][0]
s_3 = s_1.find("p").attrs["class"][1]
if s_2 == "large" and s_3 == "this-text":
print(s_2, s_3, "<- here")
# get the number 56789 ???
else:
print(s_2, s_3)
Upvotes: 0
Views: 20
Reputation: 195438
If class "this-text" is unique, you can select it and then .find_previous()
tag:
num = soup.select_one(".this-text") # or soup.find(class_="this-text")
h5 = num.find_previous()
print(h5.text, num.text)
Prints:
THIS TEXT 56789
Upvotes: 1