Butters
Butters

Reputation: 887

Capturing wget errors with python

I have a script that uses python and and wget to download a website, and then perform some tasks with the files. I am using the line os.system("wget -m -w 2 -P " directory) to call wget, recursively downloading every page in the domain. This works fine, but it has now become necessary to monitor wget for errors downloading a file when it follows a link (Think 404 error trying to access a page).

It is not a matter of getting the exit code, but looking at each 'block' of output that wget supplies.

Is there an easy way to look through the wget output with Python without having to redirect it to a file, and then search the file for an identifying string of text?

Upvotes: 0

Views: 2172

Answers (2)

bruno desthuilliers
bruno desthuilliers

Reputation: 77902

If you only want the exit code then that's what os.system() returns (warning: it's the standard linux process exit code, so 0 means 'no error' and anything else an error).

If you want more detailed information, you'll have to use the subprocess module (https://docs.python.org/2/library/subprocess.html#module-subprocess) to pipe the subprocess's stderr back to your Python code. Or you could use Python instead of wget - there are quite a few Python-based crawlers available.

Upvotes: 2

mcnnowak
mcnnowak

Reputation: 285

From what I can tell, os.system returns the exit code of the command.

So, the following should work:

code = os.system("wget -m -w 2 -P {}".format(directory)}

Upvotes: 0

Related Questions