Web crawlers truncating URLs with single quotes. Bad sitemap.xml maybe?

I'm getting quite a few failed requests on my server and they're mostly from web crawlers that encounter URLs with single quotes in them.

example: http://www.example.com/events/2013/5/5/someone's-event

and the crawler ends up browsing to

http://www.example.com/events/2013/5/5/someone

Now my sitemap.xml's URL entry DOES contain the raw single quote (not entity escaped); however all of the online sitemap generators actually generate the same thing - they don't entity escape the single quote. Also, I've submitted my sitemap.xml to online validators and it validates every time.

One thing I've noticed is that these online generators issue:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

whereas my sitemap.xml only contains:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">

Could that have something to do with it?

Upvotes: 1

Views: 152

Answers (1)

Single quotes need to be encoded in the XML document. It's just unfortunate that there are a lot of bot crawlers out there (including some major ones) that don't use the decoded version of the URL.

Upvotes: 1

Related Questions