Reputation: 3523
I am kind of asking a weird question, but i am making a spider and i am wondering is there any way to have folders of certain urls like:
mysite.com/drupal
mysite.com/wordpress
mysite.com/abc
is there any way to find for this kind of information???
Upvotes: 0
Views: 82
Reputation: 375754
Web sites don't typically advertise their entire set of URLs. You can try a few things:
Read the main page, and follow the links on the page. Each leads to another page, which contains links, and so on.
Guess at common folder names.
Eacmine the robots.txt file if the site has one. You should be a good citizen and not retrieve pages it forbids you to.
Try to get the site's sitemap, as this shows: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156184
Upvotes: 1
Reputation: 3493
If you implement a traditional spider, it will only traverse Urls is finds in the content as it goes along. You could try a dictionary or every-string-in-the-universe check at every directory level, but that wouldn't be playing nice.
So, the short answer is "no".
Upvotes: 0