How does `findAll` work in BeautifulSoup?

Can someone please explain how findAll works in BeautifulSoup?

My doubt is this row: A = soup.findAll('strong',{'class':'name fn'}). it looks like find some characters matching certain criteria.

but the original codes of the webpage is like ......<STRONG class="name fn">iPod nano 16GB</STRONG>......

how does the ('strong',{'class':'name fn'}) pick it up? thanks.

original Python codes

from bs4 import BeautifulSoup
import urllib2
import re

url="http://m.harveynorman.com.au/ipods-audio-music/ipods/ipods"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
A = soup.findAll('strong',{'class':'name fn'})
for B in A:
    print B.renderContents()

Upvotes: 1

Views: 299

Answers (1)

LotusUNSW
LotusUNSW

Reputation: 2017

From the docs: Beautifulsoup Docs

Beautiful Soup provides many methods that traverse(goes through) the parse tree, gathering Tags and NavigableStrings that match criteria you specify.

From The basic find method: findAll(name, attrs, recursive, text, limit, **kwargs)

The findAll method traverses the tree, starting at the given point, and finds all the Tag and NavigableString objects that match the criteria you give. The signature for the findall method is this:

findAll(name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)

The name argument can be used to pass in a:

  • tag name (e.g. < B >)
  • a regular expression
  • a list or dictionary
  • the value True
  • a callable object

The keyword arguments impose restrictions on the attributes of a tag.

It's very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, class, is also a Python reserved word.

You could search by CSS class with soup.find("tagName", { "class" : "cssClass" }),like the code you gave) but that's a lot of code for such a common operation. Instead, you can pass a string for attrs instead of a dictionary.

The doc has further examples to help you understand.

Upvotes: 2

Related Questions