tvieira
tvieira

Reputation: 1915

How to serve a sitemap.xml using reverse proxy in rails?

Due to business requirements, I need to create a new sitemap, every time a new page is added in the admin panel. We're using Heroku, so we looked into the sitemap_generator gem to do this. We are uploading the sitemap every time the rake sitemap:refresh is called.

But the sitemap needs to be inside our domain such as https://example.org/sitemap.xml. So we decided to use reverse proxy (with rack-reverse-proxy gem)

in our config.ru we have

use Rack::ReverseProxy do
  reverse_proxy '/sitemap.xml', 'http://our-bucket.amazonaws.com/sitemaps/sitemap.xml', :timeout => 15000,   :preserve_host => true
end

and our robots.txt file is

User-Agent: *
Allow: /
Disallow: /admin

But when we submit in google webmaster tools, I get an error, saying URL restricted by robots.txt, when I try to access directly in the browser https://our_domain.com/sitemap.xml I get an

<Error>
  <Code>InvalidArgument</Code>
  <Message>Unsupported Authorization Type</Message>
  <ArgumentName>Authorization</ArgumentName>

but accessing the s3 link, http://our-bucket.s3.amazonaws.com/sitemaps/sitemap.xml our sitemap.xml is displayed correctly.

Any ideas? Is what we're attempting to do even possible?

Upvotes: 0

Views: 774

Answers (2)

Daniel L.
Daniel L.

Reputation: 11

Ran into this same issue because I was on a system that was behind Basic Auth, and so it was passing along that header, which S3 did not like.

Resolved mine by updating reverse_proxy to this latest commit (for some reason the setting I needed didn't make it into the latest release tag):

gem 'rack-reverse-proxy', require: 'rack/reverse_proxy', git: 'https://github.com/waterlink/rack-reverse-proxy.git', ref: 'a4f28a6'

and adding the following setting to reverse_proxy:

config.middleware.use Rack::ReverseProxy do
  reverse_proxy_options stripped_headers: ['Authorization']
  ... <rules here>
end

This question was old but hopefully this helps someone in the future.

Upvotes: 1

ErvalhouS
ErvalhouS

Reputation: 4216

You could create an action that responds as xml like so app/controllers/sitemap_controller.rb:

  layout false
  def index
    @my_pages = Pages.all
    render formats: :xml
  end

And correspondent view file app/views/sitemap/index.xml.builder:

base_url = request.url.chomp('sitemap.xml')

xml.instruct! :xml, version: '1.0', encoding: 'utf-8'

xml.tag! 'urlset',
  'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
  'xmlns:xsi' => 'http://www.w3.org/2001/XMLSchema-instance',
  'xsi:schemaLocation' => 'http://www.sitemaps.org/schemas/sitemap/0.9
   http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd' do

     @my_pages.each do |page|
       xml.url {
         xml.loc URI.join(base_url, page.url)
       }
     end
  end

Don't forget to create a route for it: config/routes.rb:

    get '/sitemap.xml', to: 'sitemap#index'

No need for reverse proxy, you could also create an rake task to ping search engines and you're good to go. Happy coding!

Upvotes: 0

Related Questions