John
John

Reputation: 13699

Python Downloader

So I am trying to write a script to download a picture file with python and I found this def using google but every picture I get it to download comes out "corrupt". Any ideas...

def download(url):
 """Copy the contents of a file from a given URL
 to a local file.
 """
 import urllib
 webFile = urllib.urlopen(url)
 localFile = open(url.split('/')[-1], 'w')
 localFile.write(webFile.read())
 webFile.close()
 localFile.close()

Edit: the code tag didn't retain the indentions very nicely but I can assure you that they are there, that is not my problem.

Upvotes: 2

Views: 1470

Answers (5)

xlTobylx
xlTobylx

Reputation: 67

It's coming out corrupt because the function you're using is writing the bytes to the file, as if it was plain text. However, what you need to do is write the bytes to it in binary mode (wb). Here's an idea of what you should do:

import urllib

def Download(url, filename):
  Data = urllib.urlopen(url).read()
  File = open(filename, 'wb')
  File.Write(Data)
  #Neatly close off the file...
  File.flush()
  File.close()
  #Cleanup, for you neat-freaks.
  del Data, File

Upvotes: 2

Binary Phile
Binary Phile

Reputation: 2668

import subprocess
outfile = "foo.txt"
url = "http://some/web/site/foo.txt"
cmd = "curl.exe -f -o %(outfile)s %(url)s" % locals()
subprocess.check_call(cmd)

Shelling out may seem inelegant but when you start encountering issues with more sophisticated sites, but curl has a wealth of logic for handling getting you through the barriers presented by web servers (cookies, authentication, sessions, etc.)

wget is another alternative.

Upvotes: 0

Jochen Ritzel
Jochen Ritzel

Reputation: 107588

You can simply do

urllib.urlretrieve(url, filename)

and save yourself any troubles.

Upvotes: 6

Zack Bloom
Zack Bloom

Reputation: 8417

You must include the 'b' flag, if you intend on writing a binary file. Line 7 becomes:

localFile = open(url.split('/')[-1], 'wb')

It is not necessary for the code to work, but in the future you might consider:

  • Importing outside of your functions.
  • Using os.path.basename, rather than string parsing to get the name component of a path.
  • Using the with statement to manage files, rather than having to manually close them. It makes your code cleaner, and it ensures that they are properly closed if your code throws an exception.

I would rewrite your code as:

import urllib
import os.path

def download(url):
 """Copy the contents of a file from a given URL
 to a local file in the current directory.
 """
 with urllib.urlopen(url) as webFile:
  with open(os.path.basename(url), 'wb') as localFile:
   localFile.write(webFile.read())

Upvotes: 3

Gintautas Miliauskas
Gintautas Miliauskas

Reputation: 7892

You need to open the local file in binary mode:

localFile = open(url.split('/')[-1], 'wb')

Otherwise the CR/LF characters in the binary stream will be mangled, corrupting the file.

Upvotes: 5

Related Questions