carl
carl

Reputation: 4436

only download new files (wget -N) in python

I am trying to use a python package to download new files. All I can do is download like

outdir = ""
url = ""
filename = wget.download(url, out=outdir)

but how can I tell wget to only download new files? In the command line I did it with

wget -N url

which only downloads the new files. The python package wget does not seem to have any equivalent to the -N flag? Does anybody know whether there is a way to do this with wget for python or is there another python package which can do that?

Upvotes: 1

Views: 1294

Answers (1)

John
John

Reputation: 13699

If this is the wget library you are talking about, then it is built on top of urllib rather than being a wrapper around wget. So you have a couple options.

  • If you want the -N functionality with this library you'll have to implement it yourself. Here is how wget determines what is a new file. There are 3 different techniques that are used to accomplish this. It looks for file names that do not already exist. For HTTP it looks for a Last-Modified header. For FTP it uses a LIST command then tries to parse the output as if the output were the same output as a ls -l command.

  • If you running this script on a system with a wget executable in the path then you can use subprocess.

Here is the code for that.

import subprocess
url = ''
subprocess.Popen(['wget', '-N', url])

Upvotes: 1

Related Questions