WeimusT
WeimusT

Reputation: 51

Python Beautiful Soup 'NoneType' object error

I am using Beautiful Soup to get hyperlinks in the body of web pages. Here is the code I use

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.1914-1918.net/swb.htm'
element = 'body'
request = urllib2.Request(url)
page = urllib2.urlopen(request).read()
pageSoup = BeautifulSoup(page)
for elementSoup in pageSoup.find_all(element):
  for linkSoup in elementSoup.find_all('a'):
    print linkSoup['href']

I got an AttributeError when I tried to find hyperlinks for the swb.htm page.

AttributeError: 'NoneType' object has no attribute 'next_element'

I am sure that there are a body element and a couple of 'a' elements under the body element. But strangely it works well for other pages (e.g. http://www.1914-1918.net/1div.htm).

This problem has been haunting me for days. Can anyone please point out what I did wrong.

Screenshot

enter image description here

Upvotes: 4

Views: 4492

Answers (2)

LeonPak
LeonPak

Reputation: 1

Maybe the beautifulsoup4 is not fit your Python, try removing beautifulsoup4: pip uninstall beautifulsoup4, and install the older version: pip install beautifulsoup4==<version>, I use the version 4.1.3.

Upvotes: -1

Thiago Argolo
Thiago Argolo

Reputation: 11

This happens when you have the html5lib installed.

Just try remove it and test again.

More details: https://bugs.launchpad.net/beautifulsoup/+bug/1184417

Upvotes: 1

Related Questions