KoirN
KoirN

Reputation: 338

beautiful soup bug?

I have next code:

for table in soup.findAll("table","tableData"):
    for row in table.findAll("tr"):
        data = row.findAll("td")
        url = data[0].a
        print type(url)

I get next output:

<class 'bs4.element.Tag'>

That means, that url is object of class Tag and i could get attribytes from this objects. But if i replace print type(url) to print url['href'] i get next traceback

Traceback (most recent call last):
File "baseCreator.py", line 57, in <module>
    createStoresTable()
File "baseCreator.py", line 46, in createStoresTable
    print url['href']
TypeError: 'NoneType' object has no attribute '__getitem__'

What is wrong? And how i can get value of href attribute.

Upvotes: 0

Views: 582

Answers (1)

Jon Clements
Jon Clements

Reputation: 142106

I do like BeautifulSoup but I personally prefer lxml.html (for not too wacky HTML) because of the ability to utilise XPath.

import lxml.html
page = lxml.html.parse('http://somesite.tld')
print page.xpath('//tr/td/a/@href')

Might need to implement some form of "axes" though depending on the structure.

You can also use elementsoup as a parser - details at http://lxml.de/elementsoup.html

Upvotes: 2

Related Questions