Reputation: 245
I get all the text I want from an HTML file when I use beautifulsoup like this:
category = soup.find_all("ol", {"class":"breadcrumb"})
catname = BeautifulSoup(str(category).strip()).get_text().encode("utf-8")
Output:
Home
Digital Goods
E-Books
BUT I want to skip the first category, i.e. 'Home'. I know that I can simply replace that word with "", but my question is really about how I get get beautifulsoup to get a very specific tag in the location I have singled out above.
The HTML code looks like this:
<ol class="breadcrumb">
<li><a href="http://fakeshop.com">Home</a></li>
<li><a href="http://fakeshop.com/category/51">Digital Goods</a></li>
<li><a href="http://fakeshop.com/category/98">E-Books</a></li>
</ol>
Is there anything I can do to get the second and third 'li' tags from this 'breadcrumb' section, and not others in the file?
Example (which does not work but illustrates what I'm looking for):
category = soup.find_all("ol", {"class":"breadcrumb"}), find_all("li")[1:]
Upvotes: 0
Views: 924
Reputation: 565
what about this:
category = soup.find("ol", {"class":"breadcrumb"}).findAll('li')[1:]
catname = BeautifulSoup(str(category).strip()).get_text().encode("utf-8")
?
My output is then:
[Digital Goods, E-Books]
Upvotes: 2