Noam
Noam

Reputation: 3391

Stopping Google's crawl of my site

Google has started crawling my site, but from a temporary domain (beta.mydomain instead of just mydomain) and also I only want him to crawl just some of my pages. Therefore, I want to stop their crawl and only let them crawl pages I specify in a sitemap. How can I do that? (I know how to add a sitemap, but how can I stop their current crawling and request that they'll crawl just the sitemap)

Update: If I kill the subdomain beta.mydomain - will that be "fine" by them or will they continue go over all killed pages and "not like" them? Can I specify that in each page's header?

Upvotes: 0

Views: 1153

Answers (3)

Robert
Robert

Reputation: 204

Create a single text file called 'robots.txt' in the root folder for your site. Inside...

User-agent: *
Disallow: /thisfolder/
Disallow: /foo.html
Disallow: /andthisfoldertoo/
Disallow: /andthisfile.html

I use this for project files. In fact, as I write this I think I'll change the way I work on projects and always put them in a sub-directory called /projects/project1/ so one line will do...

Disallow: /projects/

AND I also add a line for my image files. I don't like my images all over the web...

Disallow: /imgs/

Upvotes: 2

Roger
Roger

Reputation: 15813

You could start with a robots.txt file.

See google's info here

I presume you have already looked at webmaster tools and sitemaps from what you say? Do be aware that while a sitemap will help tell google WHAT to crawl, it won't work very well for telling them what NOT to crawl.

For that you will want to use the robots.txt file to block certain pages / folders.

Upvotes: 1

pyroscope
pyroscope

Reputation: 4158

Use a robots.txt, see this site.

Upvotes: 1

Related Questions