Klaaz
Klaaz

Reputation: 1640

How to handle Google indexing 'pages' that not exists

I build dynamic websites where structure is hierarchically saved in the database (Own CMS). I am using the Adjacency model to manage this database tables (PHP and Mysql through PDO)

I detected that Google is indexing pages that it should not.

An example of a tree structure used for navigation:

home
  about us
  products
    productgroup 1
    productgroup 2
  contact
    support
    sales

Imagine this structure in a pulldown menu with links to the pages. When I select products->productgroup 1 I get a url like www.domain.com/products/productgroup-1 which pulls the data from the database (based on the last uri element: productgroup-1, a slug version of the title) and shows it in my template. I do not query all elements, only the last (I should, I know).

So far so good. Google is indexing this page as expected:

http://www.domain.com/products/productgroup-1

But... When I use Google webmaster tools I see a lot of pages indexed with 404's, like:

http://www.domain.com/products
http://www.domain.com/contact

And so fort.

These pages are empty and have no link in the navigation structure.

I have designed my structure so that these pages return a 404 error. Webmastertools confirms this but keeps indexing these pages. I know I can use robots.txt to disallow Google's search bot to keep it drom indexing url's. Is there another way to do this? Should I generate a 403 instead of a 404?

I am in the dark here.

Upvotes: 0

Views: 1950

Answers (2)

Kirti
Kirti

Reputation: 7

301 redirection is the best option which you dont want pages and also you can assign those pages in robots.txt page.

Upvotes: 0

maskie
maskie

Reputation: 456

You should do a few things:

  1. Use 301 Permanent Redirection to direct this empty pages to a relevant page:

  2. Submit a sitemap to google webmaster tools.

    • This is a definitive list of URLs in your site.

    • Having a sitemap will note remove the list of 404 URLs already indexed on Google, but will inform Google of all your "official" URLs in your site and the intended crawl frequency.

    • Read more from Google webmaster tools here.

  3. Check your HTML code for references to "/products" or "/contact". Googlebot will not be crawling these URLs otherwise.

Upvotes: 2

Related Questions