Geoffrey
Geoffrey

Reputation: 467

Block crawlers from subdomain via htaccess

Our company has temp development urls that are being indexed by search engines. We need to get this to stop via a global htaccess file. By global, i mean i want to drop this access into our root that will apply the rules for each site. Every time we build a new site, i don't want to drop a htaccess file in that folder.

I am terrible at writing htaccess rules, otherwise i would have done it myself. I would appreciate any input from the community.

Here is an example temp url: 1245.temp.oursite.com

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} AltaVista [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp
RewriteRule ^.*$ "http\:\/\/oursite\.com" [R=301,L]

I've tried playing with this, but like i stated above, i'm terrible at writing htaccess rules.

Edit The question is similar to this one, however mine involves sub-domains.

Upvotes: 2

Views: 1107

Answers (2)

Panama Jack
Panama Jack

Reputation: 24458

If you simply want a universal file to block robots then you can use something like this. This is not specific to a domain.

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^.*(AltaVista|Googlebot|msnbot|Slurp).*$ [NC]
RewriteRule .* - [F,L]

Edit: If you're subdomains are accessible from the main root .htaccess file then you can use a method like this and any temp domain it should block access.

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^.*(AltaVista|Googlebot|msnbot|Slurp).*$ [NC]
RewriteCond %{HTTP_HOST} ^([0-9]+)\.temp\.oursite\.com$ [NC]
RewriteRule .* - [F,L]

Upvotes: 2

ceejayoz
ceejayoz

Reputation: 180023

If you don't want search engines to index the sites, add a robots.txt file to those subdomains. It should contain:

User-agent: *
Disallow: /

All major search engines respect the Web Robots standard.

Upvotes: 4

Related Questions