Reputation: 467
Our company has temp development urls that are being indexed by search engines. We need to get this to stop via a global htaccess file. By global, i mean i want to drop this access into our root that will apply the rules for each site. Every time we build a new site, i don't want to drop a htaccess file in that folder.
I am terrible at writing htaccess rules, otherwise i would have done it myself. I would appreciate any input from the community.
Here is an example temp url: 1245.temp.oursite.com
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} AltaVista [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp
RewriteRule ^.*$ "http\:\/\/oursite\.com" [R=301,L]
I've tried playing with this, but like i stated above, i'm terrible at writing htaccess rules.
Edit The question is similar to this one, however mine involves sub-domains.
Upvotes: 2
Views: 1107
Reputation: 24458
If you simply want a universal file to block robots then you can use something like this. This is not specific to a domain.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^.*(AltaVista|Googlebot|msnbot|Slurp).*$ [NC]
RewriteRule .* - [F,L]
Edit: If you're subdomains are accessible from the main root .htaccess file then you can use a method like this and any temp domain it should block access.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^.*(AltaVista|Googlebot|msnbot|Slurp).*$ [NC]
RewriteCond %{HTTP_HOST} ^([0-9]+)\.temp\.oursite\.com$ [NC]
RewriteRule .* - [F,L]
Upvotes: 2
Reputation: 180023
If you don't want search engines to index the sites, add a robots.txt
file to those subdomains. It should contain:
User-agent: *
Disallow: /
All major search engines respect the Web Robots standard.
Upvotes: 4