Reputation: 1915
Due to business requirements, I need to create a new sitemap, every time a new page is added in the admin panel. We're using Heroku, so we looked into the sitemap_generator gem to do this. We are uploading the sitemap every time the rake sitemap:refresh
is called.
But the sitemap needs to be inside our domain such as https://example.org/sitemap.xml
. So we decided to use reverse proxy (with rack-reverse-proxy gem)
in our config.ru we have
use Rack::ReverseProxy do
reverse_proxy '/sitemap.xml', 'http://our-bucket.amazonaws.com/sitemaps/sitemap.xml', :timeout => 15000, :preserve_host => true
end
and our robots.txt file is
User-Agent: *
Allow: /
Disallow: /admin
But when we submit in google webmaster tools, I get an error, saying URL restricted by robots.txt
, when I try to access directly in the browser https://our_domain.com/sitemap.xml
I get an
<Error>
<Code>InvalidArgument</Code>
<Message>Unsupported Authorization Type</Message>
<ArgumentName>Authorization</ArgumentName>
but accessing the s3 link, http://our-bucket.s3.amazonaws.com/sitemaps/sitemap.xml
our sitemap.xml is displayed correctly.
Any ideas? Is what we're attempting to do even possible?
Upvotes: 0
Views: 774
Reputation: 11
Ran into this same issue because I was on a system that was behind Basic Auth, and so it was passing along that header, which S3 did not like.
Resolved mine by updating reverse_proxy to this latest commit (for some reason the setting I needed didn't make it into the latest release tag):
gem 'rack-reverse-proxy', require: 'rack/reverse_proxy', git: 'https://github.com/waterlink/rack-reverse-proxy.git', ref: 'a4f28a6'
and adding the following setting to reverse_proxy:
config.middleware.use Rack::ReverseProxy do
reverse_proxy_options stripped_headers: ['Authorization']
... <rules here>
end
This question was old but hopefully this helps someone in the future.
Upvotes: 1
Reputation: 4216
You could create an action that responds as xml like so app/controllers/sitemap_controller.rb
:
layout false
def index
@my_pages = Pages.all
render formats: :xml
end
And correspondent view file app/views/sitemap/index.xml.builder
:
base_url = request.url.chomp('sitemap.xml')
xml.instruct! :xml, version: '1.0', encoding: 'utf-8'
xml.tag! 'urlset',
'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
'xmlns:xsi' => 'http://www.w3.org/2001/XMLSchema-instance',
'xsi:schemaLocation' => 'http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd' do
@my_pages.each do |page|
xml.url {
xml.loc URI.join(base_url, page.url)
}
end
end
Don't forget to create a route for it:
config/routes.rb
:
get '/sitemap.xml', to: 'sitemap#index'
No need for reverse proxy, you could also create an rake task to ping search engines and you're good to go. Happy coding!
Upvotes: 0