Al Lopez Kasi
Al Lopez Kasi

Reputation: 11

How come I get "ERR_ACCESS_DENIED" when I use python to open a URL (Wikipedia)?

this is my first ever question posted here, ever, so I might be looking for a little pity for a newbie.

I'm learning Python for my Computer Science class in high school, so I have little experience in using it to solve problems. Right now I'm working on something that gets from a random page in Wikipedia to a targeted page (also in Wiki) by following the links on each page. This is my first time to use such stuff such as urllib, so I'm only using so far what my teacher told me to use.

I've got a bit of code that should be able to open up pages in the Wiki, but I keep coming up with a page that says something about a technical error. Opening Wikipedia from a browser is alright, though.

I don't know what I need to get it to work, and I have no idea anymore on where to look to figure this out.

My code (using IDLE in Ubuntu 11.04 using Python 2.7):

import urllib
import HTMLParser

class Parser(HTMLParser.HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Start:", tag, attrs)

    def handle_endtag(self, tag):
        print("End:", tag)
    def handle_data(self, data):
        print ("Data:", data)

#proxy = {"http": "http://10.102.0.3:3128"}
browser = urllib.FancyURLopener()#(proxies = proxy)
    #The commented-out stuff is for the proxy at school.
    #Both at home and at school, I come up with errors.
f = browser.open("http://en.wikipedia.org/wiki/Special:Random")
p = Parser()
print p.feed(f.read())

My output looks like this:

('Data:', '\n') ('Start:', 'html', [('xmlns', 'http://www.w3.org/1999/xhtml'), ('xml:lang', 'en'), ('lang', 'en')]) ('Data:', '\n') ('Start:', 'head', []) ('Data:', '\n') ('Start:', 'title', []) ('Data:', 'Wikimedia Error') ('End:', 'title') ('Data:', '\n') ('Start:', 'meta', [('http-equiv', 'Content-Type'), ('content', 'text/html; charset=UTF-8')]) ('End:', 'meta') ('Data:', '\n') ('Start:', 'meta', [('name', 'author'), ('content', 'Mark Ryan')]) ('End:', 'meta') ('Data:', '\n') ('Start:', 'meta', [('name', 'copyright'), ('content', '(c) 2005-2007 Mark Ryan and others. Text licensed under the GNU Free Documentation License. http://www.gnu.org/licenses/fdl.txt')]) ('End:', 'meta') ('Data:', '\n\n') ('Start:', 'style', [('type', 'text/css')]) ('Data:', '\n') ('End:', 'style') ('Data:', '\n') ('Start:', 'script', []) ('Data:', '//\n\tfunction lines(s) {\n\t\tvar c = s.split(\' \');\n\t\tfor (var i = 0; i < c.length; i++) {\n\t\t\tdocument.write(\'') ('End:', 'div') ('Data:', "');\n\t\t}\n\t}\n//]]>") ('End:', 'script') ('Data:', '\n') ('End:', 'head') ('Data:', '\n\n') ('Start:', 'body', [('link', '#24442E'), ('text', '#000000'), ('vlink', '#24442E'), ('alink', '#FF0000')]) ('Data:', '\n') ('Start:', 'h1', []) ('Data:', 'Wikimedia Foundation') ('End:', 'h1') ('Data:', '\n') ('Start:', 'script', []) ('Data:', "lines('ccd4cf bdc3bf adb1af 9ea09f dbe5df');") ('End:', 'script') ('Data:', '\n\n') ('Start:', 'h2', []) ('Data:', 'Error') ('End:', 'h2') ('Data:', '\n\n') ('Start:', 'script', []) ('Data:', "lines('8f8f8f acacac c6c6c6 dbdbdb eaeaea f4f4f4');") ('End:', 'script') ('Data:', '\n\n') ('Data:', '\n') ('Start:', 'div', [('class', 'ContentArea')]) ('Data:', '\n\n') ('Start:', 'div', [('id', 'en'), ('lang', 'en')]) ('Data:', '\n') ('Start:', 'p', []) ('Data:', 'Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please ') ('Start:', 'a', [('href', 'http://en.wikipedia.org/wiki/Special:Random'), ('onclick', 'window.location.reload(false); return false')]) ('Data:', 'try again') ('End:', 'a') ('Data:', ' in a few minutes.') ('End:', 'p') ('Data:', '\n') ('Start:', 'p', []) ('Data:', 'You may be able to get further information in the ') ('Start:', 'a', [('href', 'irc://chat.freenode.net/wikipedia')]) ('Data:', '#wikipedia') ('End:', 'a') ('Data:', ' channel on the ') ('Start:', 'a', [('href', 'http://www.freenode.net')]) ('Data:', 'Freenode IRC network') ('End:', 'a') ('Data:', '.') ('End:', 'p') ('Data:', '\n') ('Start:', 'p', []) ('Data:', 'The Wikimedia Foundation is a non-profit organisation which hosts some of the most popular sites on the Internet, including Wikipedia. It has a constant need to purchase new hardware. If you would like to help, please ') ('Start:', 'a', [('href', 'http://wikimediafoundation.org/wiki/Fundraising')]) ('Data:', 'donate') ('End:', 'a') ('Data:', '.') ('End:', 'p') ('Data:', '\n') ('Start:', 'hr', [('noshade', 'noshade'), ('size', '1px'), ('width', '80%')]) ('End:', 'hr') ('Data:', '\n') ('Start:', 'div', [('class', 'TechnicalStuff')]) ('Data:', '\nIf you report this error to the Wikimedia System Administrators, please include the details below.') ('Start:', 'br', []) ('End:', 'br') ('Data:', '\n') ('End:', 'div') ('Data:', '\n') ('Start:', 'div', [('class', 'TechnicalStuff')]) ('Data:', '\n') ('Start:', 'bdo', [('dir', 'ltr')]) ('Data:', '\nRequest: GET http://en.wikipedia.org/wiki/Special:Random, from 112.205.80.8 via sq72.wikimedia.org (squid/2.7.STABLE9) to ()') ('Start:', 'br', []) ('End:', 'br') ('Data:', '\nError: ERR_ACCESS_DENIED, errno [No Error] at Mon, 06 Feb 2012 11:58:50 GMT\n') ('End:', 'bdo') ('Data:', '\n') ('End:', 'div') ('Data:', '\n') ('End:', 'div') ('Data:', '\n\n') ('End:', 'div') ('Data:', '\n') ('Start:', 'script', []) ('Data:', "lines('9ea09f adb1af bdc3bf ccd4cf');") ('End:', 'script') ('Data:', '\n\n') ('End:', 'body') ('Data:', '\n') ('End:', 'html') ('Data:', '\n\n') None

Upvotes: 1

Views: 1663

Answers (3)

Alexander
Alexander

Reputation: 1

Make sure you forge your user agent. Wikipedia doesn't like Python or Perl, so any User-Agent that starts with "lwp" or "python-urllib" will get a "temporary" "technical problem."

The code provided by eviltnan does this. He didn't really specify why he forged the user agent, though, so I wanted to point out that it usually is not necessary when accessing most sites, but is necessary when accessing Wikipedia.

Upvotes: 0

Jiayi Tang
Jiayi Tang

Reputation: 11

Consider using the actual API. Try this:

import urllib2
urllib2.urlopen("http://en.wikipedia.org//w/api.php?action=parse&format=txt&page=**Your_Page_Here**&prop=text"

It should return a HTML document of the text of the article.

Upvotes: 1

Thorin Schiffer
Thorin Schiffer

Reputation: 2846

Try to use urllib2 and add headers like this. At least you won't get the 403)) And in your case

 opener = urllib2.build_opener()
 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
 f = opener.open("http://en.wikipedia.org/wiki/Special:Random")

instead of

f = browser.open("http://en.wikipedia.org/wiki/Special:Random")

and don't forget import the library. Good luck!

Upvotes: 1

Related Questions