Andrew
Andrew

Reputation: 12442

Get webpage contents with Python?

I'm using Python 3.1, if that helps.

Anyways, I'm trying to get the contents of this webpage. I Googled for a little bit and tried different things, but they didn't work. I'm guessing that this should be an easy task, but...I can't get it. :/.

Results of urllib, urllib2:

>>> import urllib2
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    import urllib2
ImportError: No module named urllib2
>>> import urllib
>>> urllib.urlopen("http://www.python.org")
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    urllib.urlopen("http://www.python.org")
AttributeError: 'module' object has no attribute 'urlopen'
>>> 

Python 3 solution

Thank you, Jason. :D.

import urllib.request
page = urllib.request.urlopen('http://services.runescape.com/m=hiscore/ranking?table=0&category_type=0&time_filter=0&date=1519066080774&user=zezima')
print(page.read())

Upvotes: 87

Views: 219999

Answers (8)

Jonathan Hartley
Jonathan Hartley

Reputation: 16044

If you're writing a project which installs packages from PyPI, then the best and most common library to do this is requests. It provides lots of convenient but powerful features. Use it like this:

import requests
response = requests.get('http://hiscore.runescape.com/index_lite.ws?player=zezima')
print (response.status_code)
print (response.content)

But if your project does not install its own dependencies, i.e. is limited to things built-in to the standard library, then you should consult one of the other answers.

Upvotes: 62

Chalist
Chalist

Reputation: 3307

Also you can use faster_than_requests package. That's very fast and simple:

import faster_than_requests as r
content = r.get2str("http://test.com/")

Look at this comparison:

enter image description here

Upvotes: 2

Suppose you want to GET a webpage's content. The following code does it:

# -*- coding: utf-8 -*-
# python

# example of getting a web page

from urllib import urlopen
print urlopen("http://xahlee.info/python/python_index.html").read()

Upvotes: 0

Jason R. Coombs
Jason R. Coombs

Reputation: 42724

Because you're using Python 3.1, you need to use the new Python 3.1 APIs.

Try:

urllib.request.urlopen('http://www.python.org/')

Alternately, it looks like you're working from Python 2 examples. Write it in Python 2, then use the 2to3 tool to convert it. On Windows, 2to3.py is in \python31\tools\scripts. Can someone else point out where to find 2to3.py on other platforms?

Edit

These days, I write Python 2 and 3 compatible code by using six.

from six.moves import urllib
urllib.request.urlopen('http://www.python.org')

Assuming you have six installed, that runs on both Python 2 and Python 3.

Upvotes: 35

Martin Thoma
Martin Thoma

Reputation: 136865

A solution with works with Python 2.X and Python 3.X:

try:
    # For Python 3.0 and later
    from urllib.request import urlopen
except ImportError:
    # Fall back to Python 2's urllib2
    from urllib2 import urlopen

url = 'http://hiscore.runescape.com/index_lite.ws?player=zezima'
response = urlopen(url)
data = str(response.read())

Upvotes: 0

Zuko
Zuko

Reputation: 2924

If you ask me. try this one

import urllib2
resp = urllib2.urlopen('http://hiscore.runescape.com/index_lite.ws?player=zezima')

and read the normal way ie

page = resp.read()

Good luck though

Upvotes: 9

Joe Koberg
Joe Koberg

Reputation: 26759

Mechanize is a great package for "acting like a browser", if you want to handle cookie state, etc.

http://wwwsearch.sourceforge.net/mechanize/

Upvotes: 5

JasDev
JasDev

Reputation: 736

You can use urlib2 and parse the HTML yourself.

Or try Beautiful Soup to do some of the parsing for you.

Upvotes: 2

Related Questions