Reputation: 2411
I am running a Rails app on Heroku with a custom domain. Let's call my Heroku app myapp.herokuapp.com
and the custom domain www.myapp.com
. I have accidentally gotten myapp.herokuapp.com indexed (some 700-3000 indexed pages) by Google causing duplicate content between the two.
I recently discovered this and put a 301 in a before_filter in applications controller like this:
def forward_from_heroku
redirect_to "http://www.myapp.com#{request.path}", :status => 301 if request.host.include?('herokuapp')
end
This successfully redirects (almost) all traffic from myapp.herokuapp.com to www.myapp.com I have also requested an adress change to myapp.com in Google Webmaster Tools.
This works fine, except for files in the public folder (obviously). The problem is that it still access robots.txt and sitemap.xml, which in turn points to an external sitemap (at AWS). I could see how Google-bot interprets this as there is still content to be browsed (although everything is 301'd) on myapp.herokuapp.com.
What I would like to do is to add code to the app so that if Google access the website through myapp.herokuapp.com they get one sitemap.xml/robots.txt and another if it is accessed through www.myapp.com
How can I code this in my config.rb or elsewhere? Basically, I need to bypass the public folder for myapp.herokuapp.com.
Upvotes: 0
Views: 762
Reputation: 2411
This is what I did, it's not elegant but it works. I removed sitemap.xml and robots.txt from public folder and put them in the config folder. Then:
routes.rb
get '/robots.txt' => 'home#robots'
get '/sitemap.xml' => 'home#sitemaps'
def robots
unless request.host.eql?('myapp.herokuapp.com')
robots = File.read(Rails.root + "config/robots.txt")
render :text => robots, :layout => false, :content_type => "text/plain"
end
end
def sitemaps
unless request.host.eql?('myapp.herokuapp.com')
sitemaps = File.read(Rails.root + "config/sitemap.xml")
render :text => sitemaps, :layout => false, :content_type => "text/xml"
end
end
Upvotes: 0
Reputation: 5998
you can constrain routes based on domain:
scope constraints: {host: /^regex-matching-your-domain/} do
then just return a 404 for robots.txt and sitemap.xml within that scope:
scope constraints: {host: /heroku.com$/} do
get '/robots.txt' => Proc.new { |env|
[404, {'Content-Type' => 'text/plain'}, ['Not Found']]
}
end
also: you may consider using canonical urls. it may be a more effective solution for SEO, i'm not sure. https://support.google.com/webmasters/answer/139066?hl=en
Upvotes: 1