Reputation: 1640
I build dynamic websites where structure is hierarchically saved in the database (Own CMS). I am using the Adjacency model to manage this database tables (PHP and Mysql through PDO)
I detected that Google is indexing pages that it should not.
An example of a tree structure used for navigation:
home
about us
products
productgroup 1
productgroup 2
contact
support
sales
Imagine this structure in a pulldown menu with links to the pages. When I select products->productgroup 1 I get a url like www.domain.com/products/productgroup-1 which pulls the data from the database (based on the last uri element: productgroup-1, a slug version of the title) and shows it in my template. I do not query all elements, only the last (I should, I know).
So far so good. Google is indexing this page as expected:
http://www.domain.com/products/productgroup-1
But... When I use Google webmaster tools I see a lot of pages indexed with 404's, like:
http://www.domain.com/products
http://www.domain.com/contact
And so fort.
These pages are empty and have no link in the navigation structure.
I have designed my structure so that these pages return a 404 error. Webmastertools confirms this but keeps indexing these pages. I know I can use robots.txt to disallow Google's search bot to keep it drom indexing url's. Is there another way to do this? Should I generate a 403 instead of a 404?
I am in the dark here.
Upvotes: 0
Views: 1950
Reputation: 7
301 redirection is the best option which you dont want pages and also you can assign those pages in robots.txt page.
Upvotes: 0
Reputation: 456
You should do a few things:
Use 301 Permanent Redirection to direct this empty pages to a relevant page:
Even if Google does not crawl http://www.domain.com/products, some people may still access this link by removing the last segment from the URL from the browser. You probably don't want to show them 404s but some relevant information.
For example, you can redirect http://www.domain.com/products AND http://www.domain.com/products/ to http://www.domain.com/products/productgroup-1
Learn more about 301 redirection from Moz
It is possible to use mod-rewrite to do 301 redirects instead of doing it at code level.
Submit a sitemap to google webmaster tools.
This is a definitive list of URLs in your site.
Having a sitemap will note remove the list of 404 URLs already indexed on Google, but will inform Google of all your "official" URLs in your site and the intended crawl frequency.
Read more from Google webmaster tools here.
Check your HTML code for references to "/products" or "/contact". Googlebot will not be crawling these URLs otherwise.
Upvotes: 2