Reputation: 1721
With regard to Google's AJAX crawling spec, if the server returns one thing (namely, a JavaScript-heavy file) for a #!
URL and something else (namely, a "html snapshot" of the page) to Googlebot when the #!
is replaced with ?_escaped_fragment_=
, that feels like cloaking to me. After all, how is Googlebot sure that the server is returning good faith equivalents for both the #!
and ?_escaped_fragment_=
URLs. Yet this is what the AJAX crawling spec actually tells webmasters to do. Am I missing something? How is Googlebot sure that the server is returning the same content in both cases?
Upvotes: 4
Views: 679
Reputation: 3605
The crawler does not know. But it never knows even for sites that return plain ol' html either - it is extremely easy to write code that cloaks the site based on http headers used by crawlers or known IP headers.
See this related question: How does Google Know you are Cloaking?
Most of it seems like conjecture, but it seems likely there are various checks in-place, varying between spoofing normal browser headers and actual real-person looking at the page.
Continuing the conjecture, it certainly wouldn't be beyond the capabilities of programmers at Google to write a form of crawler that actually retrieved what the user sees - after all, they have their own browser that does just that. It would be prohibitively CPU-expensive to do that all the time, but probably makes sense for the occasional spot-check.
Upvotes: 1