Reputation: 5575
I am looking to use beutifulsouop to extract text in the span
section with a particular class
value, and also the div
section with a different class
value while preserving order.
The following works with the exception that it does not preserve the order [i.e. the list has all of the div
elements at the end, rather than when they occur in the page]
extract = soup.findAll('span', {"class": "value1"})
extract += soup.findAll('div', {"class": "value2"})
Note - this is similar, but slightly different to the question - BeautifulSoup findAll() given multiple classes? as i am specifically looking in span
and div
codes.
Upvotes: 2
Views: 5919
Reputation: 4199
Nothing prevents filtering out wrong tags. Extending the answer you mention:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<html><body><div class="class1"></div><i class="class1"></i><span class="class2"></span><div class="class1"></div></body></html>')
for e in soup.findAll(True, {"class":["class1", "class2"]}):
if e.name in ("div", "span"):
print e
The filter can also be written as one-liner:
[e for e in soup.findAll(True, {"class":["class1", "class2"]}) if e.name in ("div", "span")]
BTW, even this can work:
soup.findAll(["div", "span"], {"class":["class1", "class2"]})
See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-name-argument for the documentation on what can be the first argument to find all.
Upvotes: 4