Reputation: 51
I am using Beautiful Soup to get hyperlinks in the body of web pages. Here is the code I use
import urllib2
from bs4 import BeautifulSoup
url = 'http://www.1914-1918.net/swb.htm'
element = 'body'
request = urllib2.Request(url)
page = urllib2.urlopen(request).read()
pageSoup = BeautifulSoup(page)
for elementSoup in pageSoup.find_all(element):
for linkSoup in elementSoup.find_all('a'):
print linkSoup['href']
I got an AttributeError when I tried to find hyperlinks for the swb.htm page.
AttributeError: 'NoneType' object has no attribute 'next_element'
I am sure that there are a body element and a couple of 'a' elements under the body element. But strangely it works well for other pages (e.g. http://www.1914-1918.net/1div.htm).
This problem has been haunting me for days. Can anyone please point out what I did wrong.
Screenshot
Upvotes: 4
Views: 4492
Reputation: 1
Maybe the beautifulsoup4 is not fit your Python, try removing beautifulsoup4: pip uninstall beautifulsoup4
, and install the older version: pip install beautifulsoup4==<version>
, I use the version 4.1.3
.
Upvotes: -1
Reputation: 11
This happens when you have the html5lib installed.
Just try remove it and test again.
More details: https://bugs.launchpad.net/beautifulsoup/+bug/1184417
Upvotes: 1