Reputation: 9373
I have am trying to parse a webpage that looks like this with Python->Beautiful Soup:
I am trying to extract the contents of the highlighted td div. Currently I can get all the divs by
alltd = soup.findAll('td')
for td in alltd:
print td
But I am trying to narrow the scope of that to search the tds in the class "tablebox" which still will probably return 30+ but is more managable a number than 300+.
How can I extract the contents of the highlighted td in picture above?
Upvotes: 48
Views: 103792
Reputation: 8205
It is useful to know that whatever elements BeautifulSoup finds within one element still have the same type as that parent element - that is, various methods can be called.
So this is somewhat working code for your example:
soup = BeautifulSoup(html)
divTag = soup.find_all("div", {"class": "tablebox"})
for tag in divTag:
tdTags = tag.find_all("td", {"class": "align-right"})
for tag in tdTags:
print tag.text
This will print all the text of all the td
tags with the class of "align-right" that have a parent div
with the class of "tablebox".
Upvotes: 85