Reputation: 338
I have next code:
for table in soup.findAll("table","tableData"):
for row in table.findAll("tr"):
data = row.findAll("td")
url = data[0].a
print type(url)
I get next output:
<class 'bs4.element.Tag'>
That means, that url is object of class Tag and i could get attribytes from this objects.
But if i replace print type(url)
to print url['href']
i get next traceback
Traceback (most recent call last):
File "baseCreator.py", line 57, in <module>
createStoresTable()
File "baseCreator.py", line 46, in createStoresTable
print url['href']
TypeError: 'NoneType' object has no attribute '__getitem__'
What is wrong? And how i can get value of href attribute.
Upvotes: 0
Views: 582
Reputation: 142106
I do like BeautifulSoup
but I personally prefer lxml.html
(for not too wacky HTML) because of the ability to utilise XPath.
import lxml.html
page = lxml.html.parse('http://somesite.tld')
print page.xpath('//tr/td/a/@href')
Might need to implement some form of "axes" though depending on the structure.
You can also use elementsoup
as a parser - details at http://lxml.de/elementsoup.html
Upvotes: 2