BeautifulSoup: Get generic tags from a specific class only

Question

I get all the text I want from an HTML file when I use beautifulsoup like this:

category = soup.find_all("ol", {"class":"breadcrumb"})
catname = BeautifulSoup(str(category).strip()).get_text().encode("utf-8")

Output:

Home
Digital Goods
E-Books

BUT I want to skip the first category, i.e. 'Home'. I know that I can simply replace that word with "", but my question is really about how I get get beautifulsoup to get a very specific tag in the location I have singled out above.

The HTML code looks like this:


Home
Digital Goods
E-Books

Is there anything I can do to get the second and third 'li' tags from this 'breadcrumb' section, and not others in the file?

Example (which does not work but illustrates what I'm looking for):

category = soup.find_all("ol", {"class":"breadcrumb"}), find_all("li")[1:]

steph · Accepted Answer

what about this:

category = soup.find("ol", {"class":"breadcrumb"}).findAll('li')[1:]
catname = BeautifulSoup(str(category).strip()).get_text().encode("utf-8")

?

My output is then:

[Digital Goods, E-Books]

BeautifulSoup: Get generic tags from a specific class only

Answers (1)

Related Questions