Reputation: 79
Can someone please explain how findAll
works in BeautifulSoup?
My doubt is this row: A = soup.findAll('strong',{'class':'name fn'})
. it looks like find some characters matching certain criteria.
but the original codes of the webpage is like ......<STRONG class="name fn">iPod nano 16GB</STRONG>
......
how does the ('strong',{'class':'name fn'})
pick it up? thanks.
original Python codes
from bs4 import BeautifulSoup
import urllib2
import re
url="http://m.harveynorman.com.au/ipods-audio-music/ipods/ipods"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
A = soup.findAll('strong',{'class':'name fn'})
for B in A:
print B.renderContents()
Upvotes: 1
Views: 299
Reputation: 2017
From the docs: Beautifulsoup Docs
Beautiful Soup provides many methods that traverse(goes through) the parse tree, gathering Tags
and NavigableStrings
that match criteria you specify.
From The basic find method: findAll(name, attrs, recursive, text, limit, **kwargs)
The findAll
method traverses the tree, starting at the given point, and finds all the Tag
and NavigableString
objects that match the criteria you give. The signature for the findall
method is this:
findAll(name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)
The name
argument can be used to pass in a:
The keyword arguments impose restrictions on the attributes of a tag.
It's very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, class
, is also a Python reserved word.
You could search by CSS class with soup.find("tagName", { "class" : "cssClass" })
,like the code you gave) but that's a lot of code for such a common operation. Instead, you can pass a string for attrs
instead of a dictionary.
The doc has further examples to help you understand.
Upvotes: 2