Reputation: 13
I have a construction like this:
<p>blablalba<a href='somelink'>blabla</a></p>
I need to find any tag, not only "a", which placed inside the text. for example
<p>blablalba<strong>blabla</strong>blalba</p>
How can I do this?
Upvotes: 1
Views: 53
Reputation: 195573
To find tags that have text siblings from "both" sides, you can use custom lambda function:
from bs4 import BeautifulSoup
html_doc = """
<p>blablalba<a href='somelink'>NOT FROM BOTH SIDES</a></p>
<p>blablalba<a href='somelink'>I WANT THIS</a>xxx</p>
<p>blablalba<strong>I WANT THIS</strong>blalba</p>
<p><strong>NOT FROM BOTH SIDES</strong>blalba</p>
<p>blalba<strong>NOT FROM BOTH SIDES</strong></p>
"""
soup = BeautifulSoup(html_doc, "html.parser")
def find_tags(t):
prv = t.find_previous_sibling(text=True)
nxt = t.find_next_sibling(text=True)
return (prv and nxt) and (prv.strip() and nxt.strip())
for tag in soup.find_all(find_tags):
print(tag)
Prints:
<a href="somelink">I WANT THIS</a>
<strong>I WANT THIS</strong>
Upvotes: 1