gina1752
gina1752

Reputation: 23

Keep getting 'TypeError: 'NoneType' object is not callable' with beautiful soup and python3

I am a beginner and struggling though a course, so this problem is probably really simple, but I am running this (admittedly messy) code (saved under file x.py) to extract a link and a name from a website with line formats like:

<li style="margin-top: 21px;">
  <a href="http://py4e-data.dr-chuck.net/known_by_Prabhjoit.html">Prabhjoit</a>
</li>

So I set up this: import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
for line in soup:
    if not line.startswith('<li'):
        continue
    stuff = line.split('"')
    link = stuff[3]
    thing = stuff[4].split('<')
    name = thing[0].split('>')
    count = count + 1
    if count == 18:
        break
print(name[1])
print(link)

And it keeps producing the error:

Traceback (most recent call last):
  File "x.py", line 15, in <module>
    if not line.startswith('<li'):
TypeError: 'NoneType' object is not callable

I have struggled with this for hours, and I would be grateful for any suggestions.

Upvotes: 0

Views: 2791

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122352

line is not a string, and it has no startswith() method. It is a BeautifulSoup Tag object, because BeautifulSoup has parsed the HTML source text into a rich object model. Don't try to treat it as text!

The error is caused because if you access any attribute on a Tag object that it doesn't know about, it does a search for a child element with that name (so here it executes line.find('startswith')), and since there is no element with that name, None is returned. None.startswith() then fails with the error you see.

If you wanted to find the 18th <li> element, just ask BeautifulSoup for that specific element:

soup = BeautifulSoup(html, 'html.parser')
li_link_elements = soup.select('li a[href]', limit=18)
if len(li_link_elements) == 18:
    last = li_link_elements[-1]
    print(last.get_text())
    print(last['href'])

This uses a CSS selector to find only the <a> link elements whose parent is a <li> element and that have a href attribute. The search is limited to just 18 such tags, and the last one is printed, but only if we actually found 18 in the page.

The element text is retrieved with the Element.get_text() method, which will include text from any nested elements (such as <span> or <strong> or other extra markup), and the href attribute is accessed using standard indexing notation.

Upvotes: 1

Related Questions