Beautiful Soup: Remove Tags that only contain href

Question

From BeautifulSoup I'm getting a list back of specific tags, some of the tags only contains links, no further text. When I use the get_text() method on these, I get the description of the links.

But when the tag only contains a element, I want to ignore it.

Tag: text1 desc text2 -> result: text1 desc text2 (OKAY)

Tag: desc -> result: desc (NOT OKAY)

When the tag only contains a link, I want to filter them out. How can I do that?

alecxe · Accepted Answer

The idea is to iterate over p tags and check if there is only one child containing the a tag:

from bs4 import BeautifulSoup


data = """

    text1 desc1 text2
    desc2
    desc3text3
    text4des4
    text5

"""
soup = BeautifulSoup(data)
for p in soup('p', class_='abc'):
    if len(p.contents) == 1 and p.contents[0].name == 'a':
        print p

prints:

desc2

FYI, .contents contains the list of tag's children.

Beautiful Soup: Remove Tags that only contain href

Answers (1)

Related Questions