Reputation: 363
I am pretty new to Python and this could be a very simple type of error, but can´t work out what´s wrong. I am trying to get the links from a website containing a specific substring, but get the "TypeError: 'NoneType' object is not iterable" when I do it. I believe the problem is related to the links I get from the website. Anybody knows what is the problem here?
from bs4 import BeautifulSoup
from urllib.request import urlopen
html_page = urlopen("http://www.scoresway.com/?sport=soccer&page=competition&id=87&view=matches")
soup = BeautifulSoup(html_page, 'html.parser')
lista=[]
for link in soup.find_all('a'):
lista.append(link.get('href'))
for text in lista:
if "competition" in text:
print (text)
Upvotes: 0
Views: 3632
Reputation: 15376
You're getting a TypeError
exception because some 'a' tags dont have a 'href' attribute , and so get('href')
returns None
, wich is not iterable .
You can fix this if you replace this :
soup.find_all('a')
with this :
soup.find_all('a', href=True)
to ensure that all your links have a 'href' attribute
Upvotes: 2
Reputation: 2096
In the line lista.append(link.get('href'))
expression link.get('href')
can return None
. After that you try to use "competition" in text
, where text
can equal to None
- it is not iterable object. To avoid this, use link.get('href', '')
and set default value of get()
- empty string ''
is iterable.
Upvotes: 1
Reputation: 450
I found mistakes in two places.
First of all the urllib
module doesn't have request
method.
from urllib.request import urlopen
# should be
from urllib import urlopen
Second one is when you are fetching the link from the page, beautifulSoup
is returning None
for few link.
print(lista)
# prints [None, u'http://facebook.com/scoresway', u'http://twitter.com/scoresway', ...., None]
As you can see your list contains two None
and that's why when you are iterating over it, you get "TypeError: 'NoneType'
.
How to fix it?
You should remove the None
from the list.
for link in soup.find_all('a'):
if link is not None: # Add this line
lista.append(link.get('href'))
Upvotes: -1