Dylan Hettinger
Dylan Hettinger

Reputation: 795

urllib2 download HTML file

Using urllib2 in Python 2.7.4, I can readily download an Excel file:

output_file = 'excel.xls'
url = 'http://www.nbmg.unr.edu/geothermal/GEOTHERM-30Jun11.xls'
file(output_file, 'wb').write(urllib2.urlopen(url).read())

This results in the expected file that I can use as I wish.

However, trying to download just an HTML file gives me an empty file:

output_file = 'webpage.html'
url = 'http://www.nbmg.unr.edu/geothermal/mapfiles/nvgeowel.html'
file(output_file, 'wb').write(urllib2.urlopen(url).read())

I had the same results using urllib. There must be something simple I'm missing or don't understand. How do I download an HTML file from a URL? Why doesn't my code work?

Upvotes: 1

Views: 4425

Answers (3)

Ricardo
Ricardo

Reputation: 136

If you want to download files or simply save a webpage you can use urlretrieve(from urllib library)instead of use read and write.

import urllib
urllib.urlretrieve("http://www.nbmg.unr.edu/geothermal/mapfiles/nvgeowel.html","doc.html")
#urllib.urlretrieve("url","save as..")

If you need to set a timeout you have to put it at the start of your file:

import socket
socket.setdefaulttimeout(25)
#seconds

Upvotes: 3

Kane Blueriver
Kane Blueriver

Reputation: 4268

It also Python 2.7.4 in my OS X 10.9, and the codes work well on it.

So I think there maybe other problems prevent its working. Can you open "http://www.nbmg.unr.edu/geothermal/GEOTHERM-30Jun11.xls" in your browser?

Upvotes: 1

Hugo Rodger-Brown
Hugo Rodger-Brown

Reputation: 11582

This may not directly answer the question, but if you're working with HTTP and have sufficient privileges to install python packages, I'd really recommend doing this with 'requests'. There's a related answered here - https://stackoverflow.com/a/13137873/45698

Upvotes: 0

Related Questions