Reputation: 55643
I was fixing url's on a website, and one of the problems there was that the url's contained characters that were sometimes upper-case while other times lower-case, the server did not care about it, but google did, and indexed the pages as duplicates. Also some urls contained characters that are simply not allowed to be in that part of the URL, like commas "," and brackets "()" although [round brackets are technically not reserved][1] I still decided to get rid of them by encoding them.
I added a check that checks if the url is valid, and if not, would do a 301 redirect to the correct url.
for example http://www.example.com/articles/SomeGreatArticle(2012).html would do a 301 redirect to http://www.example.com/articles/somegreatarticle%282012%29.html
It works, and it does one redirect to the correct url.
But for a small fraction of the pages (which are possibly the only pages google has indexed so far) google webmaster tools started to give me the following error under the Crawl errors > Not followed tab:
Google couldn't follow your URL because it redirected too many times.
googling for this error with quotes gives me 0 results, and I'm sure I'm not the only one to ever get this error, so I would like to know some more information about it, for example:
Upvotes: 1
Views: 2860
Reputation: 1125
Oke first of all I would remove the () and , signs from the urls, it is a fact that googlebot has a harder time working with these. And they don't do any benefit for SEO purposes either. Readability for the client isn't an issues so if i where you just use a - or _ dash. Try not to use any other character in your file/folder names.
You should also clean up your html, there are quite some errors and issues to resolve.
A cleaner source is better for google, browsers and your visitors.
I couldn't find any definitive problem that google would have an issue with.
Upvotes: 0
Reputation: 55643
SOLUTION
According to this experiment http://www.monperrus.net/martin/google+url+encoding
Google has it's own character encoding rules, where google will always encode some characters and always decode other.
The following characters are never encoded
-,.@~_*)!$'(
So even if you give Google this url
http://www.example.com/articles/somegreatarticle%282012%29.html
where the round brackets () are encoded, google will transform this URL, decode the brackets and follow this URL instead:
http://www.example.com/articles/somegreatarticle(2012).html
What happened in my situation:
http://www.example.com/articles/somegreatarticle(2012).html
my server would do a 301 redirect to
http://www.example.com/articles/somegreatarticle%282012%29.html
while Googlebot would ignore the encoded brackets and follow:
http://www.example.com/articles/somegreatarticle(2012).html
get redirected to
http://www.example.com/articles/somegreatarticle%282012%29.html
follow
http://www.example.com/articles/somegreatarticle(2012).html
get redirected to
http://www.example.com/articles/somegreatarticle%282012%29.html
and give up after a couple of tries and show the "Google couldn't follow your URL because it redirected too many times" error.
Upvotes: 3
Reputation: 1133
I don't know about Google webmaster tools, but I have seen a similar error in PHP, when there is an infinite loop of redirection. Make sure that none of the pages is redirecting to itself.
Upvotes: 0