Reputation: 424
I am trying to crawl links from a website then use download manager to download files.
I've tried:
wget --wait=20 --limit-rate=20K -r -p -U Mozilla "www.mywebsite.com"
I can't figure out how to use wget
or regular expressions to save the desired links only!
Upvotes: 1
Views: 2046
Reputation: 43013
wget offers a wide variety of options for fine tuning files download in a recursive crawl.
Here are a few options that can interest you:
--accept-regex urlregex
Download any url matching urlregex
. urlregex
is a regular expression which is matched against the complete URL.
--reject-regex urlregex
Ignore any url matching urlregex
. urlregex
is a regular expression which is matched against the complete URL.
-L
Tells wget to follow only the relative links.
Relative links example:
<a href="foo.gif">
<a href="foo/bar.gif">
<a href="../foo/bar.gif">
Non relative links:
<a href="/foo.gif">
<a href="/foo/bar.gif">
<a href="http://www.server.com/foo/bar.gif">
Upvotes: 2