the viking
the viking

Reputation: 61

Wget to resume downloading files from where it stopped

I am new to Wget and I am wondering if there is a way to resume downloading files from where I stopped downloading? For example: I am downloading bunch of files from a website that has files like this: 1.pdf 2.pdf 3.pdf 4.pdf

and for some reason I stopped downloading when wget has downloaded 1,2 files and I moved these downloaded files to another storage that I can not access right now. So can I run a command to exclude the first 2 files that I downloaded them already and start over from the 3rd file and so on?

I am using this command already:

wget -m -np -c -U "MyDir" -R "index.html*" "TheURL"

Sorry for my dumb way to explain my issue and thanks for your responses in advance.

Upvotes: 6

Views: 7584

Answers (2)

social
social

Reputation: 517

wget -c https://url.com/filename.ext

Tested on Debian 12 and terminated wget then resumes downloading the original file at 64% Source Web Server should support it:

HTTP request sent, awaiting response... 206 Partial Content
Length: 1222609558 (117M), 43520449 (42M) remaining [application/octet-stream]

From man wget:

-c

--continue

Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program. For instance:

wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

If there is a file named ls-lR.Z in the current directory, Wget will assume that it is the first portion of the remote file, and will ask the server to continue the retrieval from an offset equal to the length of the local file.

source: https://unix.stackexchange.com/a/564363

Upvotes: 1

Silas S. Brown
Silas S. Brown

Reputation: 1652

You are already using the -R option to reject filenames with a particular pattern (you say -R "index.html*" to reject any filename starting with index.html), so you could simply add more filenames to that reject list, i.e. use -R "index.html*,1.pdf,2.pdf" if you know you already have 1.pdf and 2.pdf saved to another computer and you're not concerned about files in other directories with identical names. (I'm not sure I understand why you're rejecting index.html* though, as that might result in some file listings not being scanned.)

For more complex situations (or if you just don't fancy writing a very long -R parameter), it might be easier to create empty files using touch before running wget, and delete the empty files afterwards. This works because you are using wget -m, which (at least in versions of wget post-2001 or so) turns on -N (timestamp checking)—as long as the server supports timestamps (most do), wget will tell the server it wants the file only if it's newer than the timestamp of the existing file, i.e. "newer than just now" if you put an empty file there just now. The empty file does have to be correctly named and in the correct directory though.

Another adjustment you might like to make is to replace -m with -r -nc -l inf (because normally -m means -r -N -l inf and I suggest replacing -N with -nc). Whereas -N checks timestamps, -nc avoids downloading any file that already exists, regardless of timestamp (so it works even if the server does not support timestamps), but more importantly, -nc results in files you've already downloaded being scanned for links, while -N does not. This is useful in conjunction with -w (--wait) if you need to mirror a large server slowly, because if you have to reboot the computer or something before it's finished, you can then resume mirroring from where it left off and wget still takes into account any links from files it fetched last time.

On the other hand -N is better if your previous download was complete and you just need to check for updates—although mirroring with -N still relies on any updated file being linked from a page that also has an updated timestamp (and if that page is reached via a link, instead of directly from a URL you supplied, then at least one of its linking pages must also have an updated timestamp for its update to get noticed, and so on)—there does not currently seem to be a way to tell wget to parse HTML pages skipped by -N as it can parse HTML pages skipped by -nc.

Upvotes: 3

Related Questions