TIMEX
TIMEX

Reputation: 272094

Will this urllib2 python code download the page of the file?

urllib2.urlopen(theurl).read() ...this downloads the file.

urllib2.urlopen(theurl).geturl()...does this download the file? (how long does it take)

Upvotes: 0

Views: 1288

Answers (5)

Ahmad Dwaik
Ahmad Dwaik

Reputation: 971

urllib2.urlopen() returns a file like object, so that when using urlopen() you are actually download the document, and it's loaded into your machine's memory, you can use file functions to read write your file, like so...

#to store python.org into your local file d:\python.org.html

from urllib2 import urlopen
doc = urlopen("http://www.python.org")
html=doc.read( )
f=open("d:/python.org.html","w+")
f.write(html)
f.close()

or simply using urllib

import urllib
urllib.urlretrieve("http://www.python.org","d:/python.org.html")

hope that helps ;)

Upvotes: 2

RichieHindle
RichieHindle

Reputation: 281675

Tested with Wireshark and Python 2.5: urllib2.urlopen(theurl).geturl() downloads some of the body. It issues a GET, reads the header and a couple of K of the body, and then stops.

Upvotes: 4

Roman
Roman

Reputation: 3198

It does not. For me, a test on google.com:

x= time.time(); urllib2.urlopen("http://www.google.com").read(); print time.time()-x
0.166881084442

x= time.time(); urllib2.urlopen("http://www.google.com").geturl(); print time.time()-x
0.0772399902344

Upvotes: 3

Kimvais
Kimvais

Reputation: 39578

No. geturl() returns the url.

For example; urllib2.urlopen("http://www.python.org").geturl() returns the string 'http://www.python.org'.

You can find this sort of stuff really easily in the python interactive shell e.g;

$ python
Python 2.4.3 (#1, Jul 27 2009, 17:57:39)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> u = urllib2.urlopen("http://www.python.org")
>>> u.geturl()
'http://www.python.org'
>>>

Upvotes: 1

Lukáš Lalinský
Lukáš Lalinský

Reputation: 41306

From the documentation:

The geturl() method returns the real URL of the page. In some cases, the HTTP server redirects a client to another URL. The urlopen() function handles this transparently, but in some cases the caller needs to know which URL the client was redirected to. The geturl() method can be used to get at this redirected URL.

Upvotes: 5

Related Questions