instanceOfObject
instanceOfObject

Reputation: 2984

A basic crawler/scraper that can provide all URLs under a parent URL

Given a parent URL (say "http://dir.yahoo.com/News_and_Media/"), I want to scrape all URLs which are on this page and till depth X.

I don't want to move to another domain even if depth criteria forces it to do that. Ex. While going for "http://dir.yahoo.com/News_and_Media/" i don't want to go to the depth 2 which is not under "dir.yahoo.com".

There must be some tool available for this thing.

Upvotes: 1

Views: 1507

Answers (2)

Here You Go
Here You Go

Reputation: 94

http://www.gnu.org/software/wget/

Specifically you would want these command line options in your case:

$ wget -r http://www.example.com/ -l X

where obviously you would replace "http://www.example.com/" with the URL of your choosing and "X" with the depth you want.

Upvotes: 3

weeyoung
weeyoung

Reputation: 172

try winhttrack

Upvotes: 2

Related Questions