Javier Martin
Javier Martin

Reputation: 363

TypeError: 'NoneType' object is not iterable using BeautifulSoup

I am pretty new to Python and this could be a very simple type of error, but can´t work out what´s wrong. I am trying to get the links from a website containing a specific substring, but get the "TypeError: 'NoneType' object is not iterable" when I do it. I believe the problem is related to the links I get from the website. Anybody knows what is the problem here?

from bs4 import BeautifulSoup
from urllib.request import urlopen

html_page = urlopen("http://www.scoresway.com/?sport=soccer&page=competition&id=87&view=matches")
soup = BeautifulSoup(html_page, 'html.parser')
lista=[]
for link in soup.find_all('a'):
    lista.append(link.get('href'))

for text in lista:
    if "competition" in text:
        print (text)

Upvotes: 0

Views: 3632

Answers (3)

t.m.adam
t.m.adam

Reputation: 15376

You're getting a TypeError exception because some 'a' tags dont have a 'href' attribute , and so get('href') returns None , wich is not iterable .

You can fix this if you replace this :

soup.find_all('a')

with this :

soup.find_all('a', href=True)

to ensure that all your links have a 'href' attribute

Upvotes: 2

Dmitry
Dmitry

Reputation: 2096

In the line lista.append(link.get('href')) expression link.get('href') can return None. After that you try to use "competition" in text, where text can equal to None - it is not iterable object. To avoid this, use link.get('href', '') and set default value of get() - empty string '' is iterable.

Upvotes: 1

Charul
Charul

Reputation: 450

I found mistakes in two places.

First of all the urllib module doesn't have request method.

 from urllib.request import urlopen
 # should be
 from urllib import urlopen

Second one is when you are fetching the link from the page, beautifulSoup is returning None for few link.

 print(lista) 
 # prints [None, u'http://facebook.com/scoresway', u'http://twitter.com/scoresway', ...., None]

As you can see your list contains two None and that's why when you are iterating over it, you get "TypeError: 'NoneType'.

How to fix it? You should remove the None from the list.

  for link in soup.find_all('a'):
      if link is not None:  # Add this line
          lista.append(link.get('href'))

Upvotes: -1

Related Questions