Артем
Артем

Reputation: 13

Beautifulsoup. How to find tag, wrapped in text

I have a construction like this:

<p>blablalba<a href='somelink'>blabla</a></p>

I need to find any tag, not only "a", which placed inside the text. for example

<p>blablalba<strong>blabla</strong>blalba</p>

How can I do this?

Upvotes: 1

Views: 53

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195573

To find tags that have text siblings from "both" sides, you can use custom lambda function:

from bs4 import BeautifulSoup

html_doc = """
<p>blablalba<a href='somelink'>NOT FROM BOTH SIDES</a></p>
<p>blablalba<a href='somelink'>I WANT THIS</a>xxx</p>
<p>blablalba<strong>I WANT THIS</strong>blalba</p>

<p><strong>NOT FROM BOTH SIDES</strong>blalba</p>
<p>blalba<strong>NOT FROM BOTH SIDES</strong></p>
"""

soup = BeautifulSoup(html_doc, "html.parser")


def find_tags(t):
    prv = t.find_previous_sibling(text=True)
    nxt = t.find_next_sibling(text=True)

    return (prv and nxt) and (prv.strip() and nxt.strip())


for tag in soup.find_all(find_tags):
    print(tag)

Prints:

<a href="somelink">I WANT THIS</a>
<strong>I WANT THIS</strong>

Upvotes: 1

Related Questions