Use wget to crawl specific URLs

Question

I am trying to crawl links from a website then use download manager to download files.

I've tried:

wget --wait=20 --limit-rate=20K -r -p -U Mozilla "www.mywebsite.com"

I can't figure out how to use wget or regular expressions to save the desired links only!

Stephan · Accepted Answer

wget offers a wide variety of options for fine tuning files download in a recursive crawl.

Here are a few options that can interest you:

Download any url matching urlregex. urlregex is a regular expression which is matched against the complete URL.

Ignore any url matching urlregex. urlregex is a regular expression which is matched against the complete URL.

Tells wget to follow only the relative links.

Relative links example:

Non relative links:

Answers (1)