Reputation: 595
I'm having problems getting my website to index correctly by Google.
root
- cms
- www
example.com points to the root where a .htaccess routes all requests to /www:
RewriteEngine on
RewriteRule ^(.*)$ /www/$1 [L]
The Angular front end inside /www gets data from /cms via REST api. So far so good.
What I want to achieve is that bots don't crawl inside my ajaxified /www page but instead inside /cms where I print out static contents corresponding to the URL structure in /www.
/www/test1 -> Outputs nice content via REST
/cms/test1 -> Outputs text-only content for the crawler
I'm redirecting the bots coming to example.com/www to /cms like this:
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteRule ^(.*)$ http://www.example.com/cms/$1 [R=301,L]
I also registered a sitemap with Google with the following contents:
http://www.example/test1
http://www.example/test2
and so on...
This all works fine BUT: Google is also crawling the static contents inside /cms without being redirected there by me. I only want this static subdomain to be fed through the redirect but not when Google's bot is searching for it itself. Kind of "disallowing" the bot to crawl here - but in the other hand I NEED it to crawl it. A catch 22 in my opinion.
RewriteEngine On
# Sitemap
RewriteRule ^sitemap(-+([a-zA-Z0-9_-]+))?\.xml(\.gz)?$ /cms/sitemap$1.xml$2 [L]
RewriteRule ^sitemap(-+([a-zA-Z0-9_-]+))?\.html(\.gz)?$ /cms/sitemap$1.xml$2 [L]
# Redirect bots to static pages
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteRule ^(.*)$ http://www.example.com/cms/$1 [R=301,L]
# Angular HTML5 mode: Don't rewrite files or directories
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !index
# Angular HTML5 mode: Rewrite everything else to index.html to allow html5 state links
RewriteRule (.*) /www/index.html [L]
I have added this tag to the www page
<meta name="fragment" content="!">
to let the crawler know there's AJAX being used on the page. And I'm using the rewrite suggest by @Croises but in reaction to Google's _escaped_fragment_ re-request. Let's wait a few days...
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteCond %{REQUEST_URI} !^/cms/
RewriteRule ^(.*)$ cms/$1 [L]
Upvotes: 1
Views: 3429
Reputation: 18671
You can't redirect to static page, and ask them to index or reference the final page without crawling the "real" content.
You can rewrite your link:
# Rewrite bots to static pages
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteCond %{REQUEST_URI} !^/cms/
RewriteRule ^(.*)$ cms/$1 [L]
Just without R=301
. Like that you show the page without redirection.
But beware of cloaking (Google and Cloaking).
Upvotes: 1