Reputation: 2687
from urllib.request import urlopen
from bs4 import BeautifulSoup
import datetime
import random
import re
random.seed(datetime.datetime.now())
def getLinks(articleUrl):
html = urlopen("http://en.wikipedia.org"+articleUrl)
bsObj = BeautifulSoup(html)
return bsObj.find("div", {"id":"bodyContent"}).findAll("a",href = re.compile("^(/wiki/)((?!:).)*$"))
getLinks('http://en.wikipedia.org')
OS is Linux. The above script spits out a "urllib.error.URLError: ". Looked through a number of attempts to solve this that I found on google, but none of them fixed my problem (attempted solutions include changing the env variable and adding nameserver 8.8.8.8 to my resolv.conf file).
Upvotes: 2
Views: 9429
Reputation: 52203
You should call getLinks()
with a valid url:
>>> getLinks('/wiki/Main_Page')
Besides, in your function, you should also call .read()
to get the response content before passing it to BeautifulSoup
:
>>> html = urlopen("http://en.wikipedia.org" + articleUrl).read()
Upvotes: 2