herci
herci

Reputation: 385

Compare file sizes and download if they're different via wget

I'm downloading some .mp3 files (all legal) via wget :

wget -r -nc files.myserver.com

I have to stop the download sometimes and at that times the file is partially downloaded. For example a 10 minutes record.mp3 file become 4 minutes record.mp3 file. It's playing correctly but incomplete.

If I use the same command above, because the record.mp3 file is already exist in my local computer wget skips that file although it isn't complete.

I wonder if there is a way to check the file sizes and if the file size in the remote server and local computer isn't same re-download it. (I've learned the --spider command gives the file size but is there any other command that automatically check the file sizes and download or not).

Upvotes: 4

Views: 6865

Answers (3)

hupster
hupster

Reputation: 41

I would go with wget's -N option for timestamping, but note that wget will only compare the file sizes if you also specify the --no-if-modified-since option. Without it, incomplete files are indeed skipped on the next run because they receive a timestamp of the current time, which is newer than that on the server.

The reason is probably that with only -N, a GET request is sent for the file with the If-Modified-Since field set. The server responds with either 200 or 304, but the 304 doesn't contain the file size so wget can't check it.

With --no-if-modified-since wget sends a HEAD request instead to get the timestamp and file size, and checks both.

What I use for recursive download of a folder:

wget -T 300 -nv -t 1 -r -nd -np -l 1 -N --no-if-modified-since -P $my_folder $my_url

With:

-T 300: Set the network timeout to 300 seconds
-nv: Turn off verbose without being completely quiet
-t 1: Set number of tries to 1
-r: Turn on recursive retrieving
-nd: Do not create a hierarchy of directories when retrieving recursively
-np: Do not ever ascend to the parent directory when retrieving recursively
-l 1: Specify recursion maximum depth 1
-N: Turn on time-stamping
--no-if-modified-since: Do not send If-Modified-Since header in ‘-N’ mode, send preliminary HEAD request instead

Upvotes: 4

slava
slava

Reputation: 837

If you need check if file was partially downloaded (has different size) or updated on remote server by timestamp and must be in this case updated locally you need use -N option.

Here some additional info about -N (--timestamping) option from Wget docs:

If the local file does not exist, or the sizes of the files do not match, Wget will download the remote file no matter what the time-stamps say.

Added From: https://www.gnu.org/software/wget/manual/wget.html (Chapter: 5 Time-Stamping)

Upvotes: -1

Dima Chubarov
Dima Chubarov

Reputation: 17179

You may try the -c option to continue the download of partially downloaded files, however the manual gives an explicit warning:

You need to be especially careful of this when using -c in conjunction with -r, since every file will be considered as an "incomplete download" candidate.

While there is no perfect solution to this problem you could try to use -N option to turn on timestamping. This might prevent errors when the file has changed on the server but only if the server supports timestamping and partial downloads. Try it and see how it goes.

  wget -r -N -c files.myserver.com

Upvotes: 3

Related Questions