Reputation: 2562
I'm getting quite a few failed requests on my server and they're mostly from web crawlers that encounter URLs with single quotes in them.
example: http://www.example.com/events/2013/5/5/someone's-event
and the crawler ends up browsing to
http://www.example.com/events/2013/5/5/someone
Now my sitemap.xml's URL entry DOES contain the raw single quote (not entity escaped); however all of the online sitemap generators actually generate the same thing - they don't entity escape the single quote. Also, I've submitted my sitemap.xml to online validators and it validates every time.
One thing I've noticed is that these online generators issue:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
whereas my sitemap.xml only contains:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
Could that have something to do with it?
Upvotes: 1
Views: 152
Reputation: 2562
Single quotes need to be encoded in the XML document. It's just unfortunate that there are a lot of bot crawlers out there (including some major ones) that don't use the decoded version of the URL.
Upvotes: 1