Matthias Max
Matthias Max

Reputation: 595

Correctly redirect bot requests to static version of a website

I'm having problems getting my website to index correctly by Google.

My folder structure looks like this:

root
 - cms
 - www

example.com points to the root where a .htaccess routes all requests to /www:

RewriteEngine on
RewriteRule ^(.*)$ /www/$1 [L]

Front end

The Angular front end inside /www gets data from /cms via REST api. So far so good.

What I want to achieve is that bots don't crawl inside my ajaxified /www page but instead inside /cms where I print out static contents corresponding to the URL structure in /www.

URL for static content:

/www/test1 -> Outputs nice content via REST

/cms/test1 -> Outputs text-only content for the crawler

Bot redirect

I'm redirecting the bots coming to example.com/www to /cms like this:

RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteRule ^(.*)$ http://www.example.com/cms/$1 [R=301,L]

Site map

I also registered a sitemap with Google with the following contents:

http://www.example/test1
http://www.example/test2
and so on...

The problem

This all works fine BUT: Google is also crawling the static contents inside /cms without being redirected there by me. I only want this static subdomain to be fed through the redirect but not when Google's bot is searching for it itself. Kind of "disallowing" the bot to crawl here - but in the other hand I NEED it to crawl it. A catch 22 in my opinion.

Edit: complete .htaccess file

RewriteEngine On

# Sitemap
RewriteRule ^sitemap(-+([a-zA-Z0-9_-]+))?\.xml(\.gz)?$ /cms/sitemap$1.xml$2 [L]
RewriteRule ^sitemap(-+([a-zA-Z0-9_-]+))?\.html(\.gz)?$ /cms/sitemap$1.xml$2 [L]

# Redirect bots to static pages
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteRule ^(.*)$ http://www.example.com/cms/$1 [R=301,L]

# Angular HTML5 mode: Don't rewrite files or directories
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !index

# Angular HTML5 mode: Rewrite everything else to index.html to allow html5 state links
RewriteRule (.*) /www/index.html [L]

Edit 2

I have added this tag to the www page

<meta name="fragment" content="!"> 

to let the crawler know there's AJAX being used on the page. And I'm using the rewrite suggest by @Croises but in reaction to Google's _escaped_fragment_ re-request. Let's wait a few days...

RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteCond %{REQUEST_URI} !^/cms/
RewriteRule ^(.*)$ cms/$1 [L]

Upvotes: 1

Views: 3429

Answers (1)

Croises
Croises

Reputation: 18671

You can't redirect to static page, and ask them to index or reference the final page without crawling the "real" content.

You can rewrite your link:

# Rewrite bots to static pages
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteCond %{REQUEST_URI} !^/cms/
RewriteRule ^(.*)$ cms/$1 [L]

Just without R=301. Like that you show the page without redirection.
But beware of cloaking (Google and Cloaking).

Upvotes: 1

Related Questions