Gloria Chen
Gloria Chen

Reputation: 73

Web Scraper problem[simple]: TypeError: object of type 'NoneType' has no len()

import requests
from bs4 import BeautifulSoup

def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status() # generate error information
        r.encoding = r.apparent_encoding # could be revised to enhance the speed
        return r.next # return the HTML to other parts of the programmme
    except:
        return ""

def fillUnivKust(ulist, html):
    soup = BeautifulSoup(html, "html.parser")
    for a in soup.find('li').children:
        if isinstance(a, bs4.element.Tag): # avoid String type's data
            aaa = a('div') # There are only 2 divs here in this case
            ulist.append([aaa[0].string]) # aaa[0] -> Product's name

def printUnivList(ulist, num):
    for i in range(num):
        u = ulist[i] # u already have
        print(u[i]) # print the ith product's name

def main():
    uinfo = []
    url = 'https://www.cattelanitalia.com/en/products?c=new'
    html = getHTMLText(url)
    fillUnivKust(uinfo, html)
    printUnivList(uinfo, 25)

main()

I tried to write a simple Python Web Scraper, the code is above, the Web Scraper only include this much code. After running it, I received an error says:

TypeError: object of type 'NoneType' has no len()

I don't know where it is wrong.

-- Update --

I changed '''return r.next''' to '''return r.content'''

It generates this error:

IndexError: list index out of range

I don't know why again.

Upvotes: 1

Views: 879

Answers (2)

mdmjsh
mdmjsh

Reputation: 945

Your issue is comming from this line:

soup = BeautifulSoup(html, "html.parser")

The TypeError when initialising the BeautifulSoup class tells us that BeautifulSoup tried to perform a len operation but was unable to do so on a NoneType object. Ergo, the data passed in as the html argument (i.e. the first positional argument) was a NoneType rather than an HTML doc.

So why is the HTML a NoneType? That arrises from your getHTMLText function, specifically the line:

return r.next

is returning None for the URL you provided in main. The line r = requests.get() is returning a requests.Response object, and from that .next returns a "PreparedRequest for the next request in a redirect chain, if there is one." [source] - i.e. not an HTML element. You probably want to update that line to:

return r.content

as per this tutorial

A couple of side notes:

  1. In the case that an exception is caught in getHTMLText you'll return an empty string, not HTML so I presume that this will also error when initialising the BeautifulSoup.
  2. It is generally a bad idea to catch blanket exceptions - better is to catch the specific expected exception(s) to be raised in the situation, and allow all others to fail. See: this blog post
  3. Single letter variable names are often hard to use with a debugger, as certain characters have special meanings. I recommend you remain your r variable and in general avoid single character variables names as it will make your life easier for yourself when you start using the debugger :)

Upvotes: 1

Tadeusz Sznuk
Tadeusz Sznuk

Reputation: 1084

Looks like there is a typo in getHTMLText() - try replacing return r.next with return r.text.

Upvotes: 0

Related Questions