kyrenia
kyrenia

Reputation: 5575

Beautifulsoup find different sections while preserving order

I am looking to use beutifulsouop to extract text in the span section with a particular class value, and also the div section with a different class value while preserving order.

The following works with the exception that it does not preserve the order [i.e. the list has all of the div elements at the end, rather than when they occur in the page]

extract = soup.findAll('span', {"class": "value1"})  
extract += soup.findAll('div', {"class": "value2"})

Note - this is similar, but slightly different to the question - BeautifulSoup findAll() given multiple classes? as i am specifically looking in span and div codes.

Upvotes: 2

Views: 5919

Answers (1)

Roman Susi
Roman Susi

Reputation: 4199

Nothing prevents filtering out wrong tags. Extending the answer you mention:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<html><body><div class="class1"></div><i class="class1"></i><span class="class2"></span><div class="class1"></div></body></html>')
for e in soup.findAll(True, {"class":["class1", "class2"]}):
    if e.name in ("div", "span"):
        print e

The filter can also be written as one-liner:

[e for e in soup.findAll(True, {"class":["class1", "class2"]}) if e.name in ("div", "span")]

BTW, even this can work:

 soup.findAll(["div", "span"], {"class":["class1", "class2"]})

See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-name-argument for the documentation on what can be the first argument to find all.

Upvotes: 4

Related Questions