Reputation: 69
I've taken over management of a site that has hundreds of pages that are no longer active in the sitemap, but which are indexed by Google. Many of these lead to products and services we no longer offer. This normally wouldn't be big bother to me, but there are lots of problems on these pages that are affecting our SEO, and I'd rather just focus on the pages that are supposed to be live rather than performing SEO on pages that no one is meant to see. While I could go through and delete these, that makes me nervous. I'd prefer to have a record of all previous pages, preferably in the directory in which they existed. Is there a way to have Google only index pages that are part of the current, active sitemap structure, without having to go one-by-one and remove them all from their various directories throughout the site?
Any pointers would be much appreciated!
Upvotes: 1
Views: 514
Reputation: 691
clearly the best ways to exclude content in Google for being indexed is to include a robots.txt file on your webroot directory, for example
User-agent: *
Disallow: /section-noindex
Disallow: /url-no-index.html
Other way is to avoid content to be indexed is to add the next metatag on the html estructure, in the head tag
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
Upvotes: 0