Gaurav Sharma
Gaurav Sharma

Reputation: 4052

Count the number of pages in a site

I'd like to know how many public pages there are in a site, say for example, smashingmagzine.com. Is there are way to count the number of pages?

Upvotes: 2

Views: 1980

Answers (3)

duncmc
duncmc

Reputation: 968

You can query Google's index using the site operator. e.g:

site:domain-to-query.com

This will return a list of the pages from the site that are currently indexed by Google. Other search engines provide similar functionality but I don't know the syntax off hand.

Of course not all pages may be indexed, and the index may contain pages which no longer exist.

Upvotes: 3

George Johnston
George Johnston

Reputation: 32258

You'll need to recursively scan the markup of each page, starting with your top level page, looking for any kind of links to other pages, and recursively crawl through them. You'll also need to keep track of what has been scanned as to not get caught in an infinate loop.

Upvotes: 0

NG.
NG.

Reputation: 22904

You need to basically crawl the site. Your process would be something like:

  • Start at root domain / homepage
  • Look for all links that point within the same domain
  • For each of those links, repeat the steps

Your loop terminates when there are no more links to crawl that are pointing in the same domain. Remember to stay in the site otherwise you'll start crawling external sites.

You can also try parsing the sitemap if they provide one.

One tool that might prove useful if using Java is JSpider or Sphider in PHP.

Upvotes: 2

Related Questions