akiva
akiva

Reputation: 2737

Is there a curl/wget option that prevents saving files in case of http errors?

I want to download a lot of urls in a script but I do not want to save the ones that lead to HTTP errors.

As far as I can tell from the man pages, neither curl or wget provide such functionality. Does anyone know about another downloader who does?

Upvotes: 30

Views: 36994

Answers (8)

Heiko Nardmann
Heiko Nardmann

Reputation: 170

Probably the option --remove-on-error has been introduced later on?

Upvotes: 0

Thomas
Thomas

Reputation: 182048

I think the -f option to curl does what you want:

-f, --fail

(HTTP) Fail silently (no output at all) on server errors. This is mostly done to better enable scripts etc to better deal with failed attempts. In normal cases when an HTTP server fails to deliver a document, it returns an HTML document stating so (which often also describes why and more). This flag will prevent curl from outputting that and return error 22. [...]

However, if the response was actually a 301 or 302 redirect, that still gets saved, even if its destination would result in an error:

$ curl -fO http://google.com/aoeu
$ cat aoeu
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/aoeu">here</A>.
</BODY></HTML>

To follow the redirect to its dead end, also give the -L option:

-L, --location

(HTTP/HTTPS) If the server reports that the requested page has moved to a different location (indicated with a Location: header and a 3XX response code), this option will make curl redo the request on the new place. [...]

Upvotes: 31

Oct
Oct

Reputation: 1525

One liner I just setup for this very purpose:

(works only with a single file, might be useful for others)

A=$$; ( wget -q "http://example.com/pipo.txt" -O $A.d && mv $A.d pipo.txt ) || (rm $A.d; echo "Removing temp file")

This will attempt to download the file from the remote host. If there is an error, the file is not kept. In all other cases, it's kept and renamed.

Upvotes: 15

Juan Lago
Juan Lago

Reputation: 1048

As alternative you can create a temporal rotational file:

wget http://example.net/myfile.json -O myfile.json.tmp -t 3 -q && mv list.json.tmp list.json

The previous command will always download the file "myfile.json.tmp" however only when the wget exit status is equal to 0 the file is rotated as "myfile.json".

This solution will prevent to overwrite the final file when a network failure occurs.

The advantage of this method is that in case that something is wrong you can inspect the temporal file and see what error message is returned.

The "-t" parameter attempt to download the file several times in case of error.

The "-q" is the quiet mode and it's important to use with cron because cron will report any output of wget.

The "-O" is the output file path and name.

Remember that for Cron schedules it's very important to provide always the full path for all the files and in this case for the "wget" program it self as well.

Upvotes: 0

user5739133
user5739133

Reputation:

NOTE: I am aware that this is an older question, but I believe I have found a better solution for those using wget than any of the above answers provide.

wget -q $URL 2>/dev/null

Will save the target file to the local directory if and only if the HTTP status code is within the 200 range (Ok).

Additionally, if you wanted to do something like print out an error whenever the request was met with an error, you could check the wget exit code for non-zero values like so:

wget -q $URL 2>/dev/null
if [ $? != 0]; then
    echo "There was an error!"
fi

I hope this is helpful to someone out there facing the same issues I was.

Update: I just put this into a more script-able form for my own project, and thought I'd share:

function dl {
    pushd . > /dev/null
    cd $(dirname $1)
    wget -q $BASE_URL/$1 2> /dev/null
    if [ $? != 0 ]; then
        echo ">> ERROR could not download file \"$1\"" 1>&2
        exit 1
    fi
    popd > /dev/null
}

Upvotes: 1

sajal
sajal

Reputation: 792

Ancient thread.. landed here looking for a solution... ended up writing some shell code to do it.

if [ `curl -s -w "%{http_code}" --compress -o /tmp/something \
      http://example.com/my/url/` = "200" ]; then 
  echo "yay"; cp /tmp/something /path/to/destination/filename
fi

This will download output to a tmp file, and create/overwrite output file only if status was a 200. My usecase is slightly different.. in my case the output takes > 10 seconds to generate... and I did not want the destination file to remain blank for that duration.

Upvotes: 3

vmonteco
vmonteco

Reputation: 15443

I have a workaround to propose, it does download the file but it also removes it if its size is 0 (which happens if a 404 occurs).

wget -O <filename> <url/to/file>
if [[ (du <filename> | cut -f 1) == 0 ]]; then
    rm <filename>;
fi;

It works for zsh but you can adapt it for other shells.

But it only saves it in first place if you provide the -O option

Upvotes: 0

Marc Queralt
Marc Queralt

Reputation: 11

You can download the file without saving using "-O -" option as

wget -O - http://jagor.srce.hr/

You can get mor information at http://www.gnu.org/software/wget/manual/wget.html#Advanced-Usage

Upvotes: -4

Related Questions