Reputation: 2984
Given a parent URL (say "http://dir.yahoo.com/News_and_Media/"), I want to scrape all URLs which are on this page and till depth X.
I don't want to move to another domain even if depth criteria forces it to do that. Ex. While going for "http://dir.yahoo.com/News_and_Media/" i don't want to go to the depth 2 which is not under "dir.yahoo.com".
There must be some tool available for this thing.
Upvotes: 1
Views: 1507
Reputation: 94
http://www.gnu.org/software/wget/
Specifically you would want these command line options in your case:
$ wget -r http://www.example.com/ -l X
where obviously you would replace "http://www.example.com/" with the URL of your choosing and "X" with the depth you want.
Upvotes: 3