aoeu
aoeu

Reputation: 154

Python Wget: Check for duplicate files and skip if it exists?

So I'm downloading files with WGET and I want to check if the file exsists before I download it. I know with the CLI version it has an option to: (see example).

# check if file exsists
# if not, download
wget.download(url, path)

With WGET it downloads the file without needing to name it. This is important because I don't want to rename the files when they already have a name.

If there is an alternative file downloading method that allows for checking for exsisting files please tell me! Thanks!!!

Upvotes: 6

Views: 4936

Answers (3)

Giorgos Myrianthous
Giorgos Myrianthous

Reputation: 39930

wget.download() doesn't have any such option. The following workaround should do the trick for you:

import subprocess

url = "https://url/to/index.html"
path = "/path/to/save/your/files"
subprocess.run(["wget", "-r", "-nc", "-P", path, url])

If the file is already there, you will get the following message:

File ‘index.html’ already there; not retrieving.

EDIT: If you are running this on Windows, you'd also have to include shell=True:

subprocess.run(["wget", "-r", "-nc", "-P", path, url], shell=True)

Upvotes: 3

nathancy
nathancy

Reputation: 46670

From the source code, the wget.download() function doesn't seem to have the option for additional parameters such as -nc or -N for skipping downloads if the file already exists. Only the CLI version seems to support this.

The function:

def download(url, out=None, bar=bar_adaptive):
    ...

You are only able to choose the url and the output directory

Upvotes: 1

John Gordon
John Gordon

Reputation: 33353

I don't see that the python module has that option.

You could try to guess the filename that will be used (typically it will be the part of the url after the last slash character).

Or you could download the file to a new temporary directory and then check if that filename exists in your main directory.

Upvotes: 1

Related Questions