Ignore public folder files (robots.txt and sitemap.xml) for specific domain

Question

I am running a Rails app on Heroku with a custom domain. Let's call my Heroku app myapp.herokuapp.com and the custom domain www.myapp.com. I have accidentally gotten myapp.herokuapp.com indexed (some 700-3000 indexed pages) by Google causing duplicate content between the two.

I recently discovered this and put a 301 in a before_filter in applications controller like this:

  def forward_from_heroku
    redirect_to "http://www.myapp.com#{request.path}", :status => 301  if request.host.include?('herokuapp')      
  end

This successfully redirects (almost) all traffic from myapp.herokuapp.com to www.myapp.com I have also requested an adress change to myapp.com in Google Webmaster Tools.

This works fine, except for files in the public folder (obviously). The problem is that it still access robots.txt and sitemap.xml, which in turn points to an external sitemap (at AWS). I could see how Google-bot interprets this as there is still content to be browsed (although everything is 301'd) on myapp.herokuapp.com.

What I would like to do is to add code to the app so that if Google access the website through myapp.herokuapp.com they get one sitemap.xml/robots.txt and another if it is accessed through www.myapp.com

How can I code this in my config.rb or elsewhere? Basically, I need to bypass the public folder for myapp.herokuapp.com.

Christoffer · Accepted Answer

This is what I did, it's not elegant but it works. I removed sitemap.xml and robots.txt from public folder and put them in the config folder. Then:

routes.rb

  get '/robots.txt' => 'home#robots'
  get '/sitemap.xml' => 'home#sitemaps'


  def robots 
    unless request.host.eql?('myapp.herokuapp.com')
      robots = File.read(Rails.root + "config/robots.txt")
      render :text => robots, :layout => false, :content_type => "text/plain"    
    end
  end

  def sitemaps
    unless request.host.eql?('myapp.herokuapp.com')
      sitemaps = File.read(Rails.root + "config/sitemap.xml")
      render :text => sitemaps, :layout => false, :content_type => "text/xml"    
    end    
  end

Ignore public folder files (robots.txt and sitemap.xml) for specific domain

Answers (2)

Related Questions