Get Certain Tags Within Parent Tag Using Beautifulsoup4

Question

I am using beautifulsoup4 with Python to scrape content from the web, with which I am attempting to extract content from specific html tags, while ignoring others.

I have the following html:


    
        "random text content here and about"
    
    
        "random text content here and about"
    
    
        "random text content here and about"
    
    
        
    
    
        "random text content here and about"
    
    
        "random text content here and about"

My goal is to understand how to instruct python to only get the

elements from within the parent

class="the-one-i-want">, otherwise ignoring all the

's within.

Currently, I am locating the content of the parent div by the following method:

content = soup.find('div', class_='the-one-i-want')

However, I can't seem to figure out how to further specify to only extract the

tags from that without error.

Padraic Cunningham · Accepted Answer

h = """
    
        "random text content here and about"
    
    
        "random text content here and about"
    
    
        "random text content here and about"
    
    
        
    
    
        "random text content here and about"
    
    
        "random text content here and about"
    
"""

You can just use find_all("p") after you find:

from bs4 import BeautifulSoup
soup = BeautifulSoup(h)

print(soup.find("div","the-one-i-want").find_all("p"))

Or use a css select:

print(soup.select("div.the-one-i-want p"))

Both will give you:

[
        "random text content here and about"
    
, 
        "random text content here and about"
    
, 
        "random text content here and about"
    
, 
        "random text content here and about"
    
, 
        "random text content here and about"
    ]

find_all will only find descendants of the div with the class the-one-i-want, the same applies to our select

Get Certain Tags Within Parent Tag Using Beautifulsoup4

Answers (1)

Related Questions