Beautifulsoup find different sections while preserving order

Question

I am looking to use beutifulsouop to extract text in the span section with a particular class value, and also the div section with a different class value while preserving order.

The following works with the exception that it does not preserve the order [i.e. the list has all of the div elements at the end, rather than when they occur in the page]

extract = soup.findAll('span', {"class": "value1"})  
extract += soup.findAll('div', {"class": "value2"})

Note - this is similar, but slightly different to the question - BeautifulSoup findAll() given multiple classes? as i am specifically looking in span and div codes.

Roman Susi · Accepted Answer

Nothing prevents filtering out wrong tags. Extending the answer you mention:

from bs4 import BeautifulSoup
soup = BeautifulSoup('
')
for e in soup.findAll(True, {"class":["class1", "class2"]}):
    if e.name in ("div", "span"):
        print e

The filter can also be written as one-liner:

[e for e in soup.findAll(True, {"class":["class1", "class2"]}) if e.name in ("div", "span")]

BTW, even this can work:

 soup.findAll(["div", "span"], {"class":["class1", "class2"]})

See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-name-argument for the documentation on what can be the first argument to find all.

Beautifulsoup find different sections while preserving order

Answers (1)

Related Questions