Stephan Psaras
Stephan Psaras

Reputation: 87

How to have BeautifulSoup's select method a list from searching find two selectors

def getPage(url):
    try:
        req = requests.get(url)
    except requests.exceptions.RequestException:
        return None
    return BeautifulSoup(req.text, 'html.parser')

bs = getPage('https://www.oreilly.com/pub/e/3094')
bs.select('#contained div')

which outputs

[<div itemprop="description">
 <h1 class="thankyou-hide" style="max-width:100%; font-size: 1.875em; line-height: 1.6em; margin: 30px 0 0px 0; color: #232323; font-family: 'guardian-text-oreilly',Helvetica,sans-serif; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; letter-spacing: -.01em; font-weight: 200;">Description:</h1>
 <p>
 Thanks to the growth of the Python scientific community, Python now has access to a fast and reliable set of high-performance libraries. This, combined with the elegance and power of the language, makes Python an irresistible choice for performance-critical applications.
 </p>
 <p>
 In this webcast you will:
 </p>
 <ul>
 <li>Learn the best tips and tricks to get the most out of the NumPy library</li>
 <li>Upgrade your applications' performances by using parallel processing</li>
 </ul>
 <h3>About Gabriele Lanaro</h3>
 <p>
 Gabriele Lanaro is a PhD candidate at the University of British Columbia, in the field of molecular simulation. He writes high-performance Python code to analyze chemical systems in large-scale simulations. He created Chemlab — a high performance visualization software in Python—and emacs-for-python—a collection of Emacs extensions that facilitate working with Python code in the Emacs text editor.
 </p>
 </div>]

I want to use the .select() method to return a list which includes p and l, so instead of just bs.select('#contained div p'), I want something like bs.select('#contained div p & l'). Any suggestions?

Alternatively, I want to know if it possible to select everything between h1 and h3 instead as well.

Upvotes: 0

Views: 31

Answers (1)

elyas
elyas

Reputation: 1415

BeautifulSoup.select() works with the usual CSS selectors. So the following should give you all of the <p> and <li> elements:

bs.select('#contained div p, #contained div li')

If you want to select elements between h1 and h3 specifically that's a little more complex:

h1 = bs.select_one('#contained div h1')
h3 = bs.select_one('#contained div h3')

result_set = bs.select('#contained div *')
result_set[result_set.index(h1)+1:result_set.index(h3)]

Upvotes: 1

Related Questions