Reputation: 117
When I was developing my site. I made a typo in one place, for example, all my pages are dir1/dir2/page.htm/par1-par2, but my typo was dir1/dir2/page/par1-par2 (note: without .htm).
It was in production for 1 day only, but Google is keep crawling those links. How to stop Google doing that?
By the way, that's not 1 page, but hundreds or thousands of pages.
Upvotes: 0
Views: 663
Reputation: 365
Try use robots.txt to deny access to this page (url)
http://www.robotstxt.org/robotstxt.html
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
test robots.txt here : http://www.frobee.com/robots-txt-check/
patterns must begin with / because robots.txt patterns always match absolute URLs.
* matches zero or more of any character.
$ at the end of a pattern matches the end of the URL; elsewhere $ matches itself.
* at the end of a pattern is redundant, because robots.txt patterns always match any URL which begins with the pattern.
Upvotes: 2
Reputation: 46602
If the page exists (perhaps because your using mod_rewrite) and rendering a custom page not found but not sending a http 410 Gone header header("HTTP/1.0 410 Gone");
then google wont know its been removed and index it just the same.
You need to add the proper headers or remove the page or not render your own 404, so it hits your servers 404, then google will remove the page from the index, also the removal of the page wont happen over night:
You could also add the url to a robots.txt file also this is not guaranteed to remove the page from the index, you could contact google as others have said but then its not guaranteed to get a response or removal.
User-agent: *
Disallow: /dir1/dir2/page/par1-par2
Good luck.
Upvotes: 1
Reputation: 918
Google has a form where you can ask it to remove a page from its index.
Check out the info at this link:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=164734
Upvotes: -1