Chaitanya Nettem
Chaitanya Nettem

Reputation: 1239

TypeError : 'NoneType' object not callable when using split in Python with BeautifulSoup

I was playing around with the BeautifulSoup and Requests APIs today. So I thought I would write a simple scraper that would follow links to a depth of 2(if that makes sense). All the links in the webpage that i am scraping are relative. (For eg: <a href="/free-man-aman-sethi/books/9788184001341.htm" title="A Free Man">) So to make them absolute I thought I would join the page url with the relative links using urljoin.

To do this I had to first extract the href value from the <a> tags and for that I thought I would use split:

#!/bin/python
#crawl.py
import requests
from bs4 import BeautifulSoup
from urlparse import urljoin

html_source=requests.get("http://www.flipkart.com/books")
soup=BeautifulSoup(html_source.content)
links=soup.find_all("a")
temp=links[0].split('"')

This gives the following error:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    temp=links[0].split('"')
TypeError: 'NoneType' object is not callable

Having dived in before properly going through the documentation, I realize that this is probably not the best way to achieve my objective but why is there a TypeError?

Upvotes: 3

Views: 8727

Answers (3)

Ollie
Ollie

Reputation: 199

I just encountered the same error - so for what it's worth four years later: if you need to split up the soup element you can also use str() on it before you split it. In your case that would be:

    temp = str(links).split('"')

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142146

Because the Tag class uses proxying to access attributes (as Pavel points out - this is used to access child elements where possible), so where it's not found the None default is returned.

convoluted example:

>>> print soup.find_all('a')[0].bob
None
>>> print soup.find_all('a')[0].foobar
None
>>> print soup.find_all('a')[0].split
None

You need to use:

soup.find_all('a')[0].get('href')

Where:

>>> print soup.find_all('a')[0].get
<bound method Tag.get of <a href="test"></a>>

Upvotes: 1

Pavel Anossov
Pavel Anossov

Reputation: 62908

links[0] is not a string, it's a bs4.element.Tag. When you try to look up split in it, it does its magic and tries to find a subelement named split, but there is none. You are calling that None.

In [10]: l = links[0]

In [11]: type(l)
Out[11]: bs4.element.Tag

In [17]: print l.split
None

In [18]: None()   # :)

TypeError: 'NoneType' object is not callable

Use indexing to look up HTML attributes:

In [21]: links[0]['href']
Out[21]: '/?ref=1591d2c3-5613-4592-a245-ca34cbd29008&_pop=brdcrumb'

Or get if there is a danger of nonexisting attributes:

In [24]: links[0].get('href')
Out[24]: '/?ref=1591d2c3-5613-4592-a245-ca34cbd29008&_pop=brdcrumb'


In [26]: print links[0].get('wharrgarbl')
None

In [27]: print links[0]['wharrgarbl']

KeyError: 'wharrgarbl'

Upvotes: 6

Related Questions