John Bachir
John Bachir

Reputation: 22751

Why is Googlebot requesting HTML from JSON-only URLs?

On a page like this: https://medstro.com/groups/nejm-group-open-forum/discussions/61

I have code like this:

$.getJSON("/newsfeeds/61?order=activity&type=discussion", function(response) {
  $(".discussion-post-stream").replaceWith($(response.newsfeed_html));
  $(".stream-posts").before($("<div class=\'newsfeed-sorting-panel generic-12\' data-id=\'61\'>\n<div class=\'newsfeed-type-menu generic-12\'>\n<ul class=\'newsfeed-sorting-buttons\'>\n<li>\n<span>\nShow\n<\/span>\n<\/li>\n<li>\n<select id=\"type\" name=\"type\"><option selected=\"selected\" value=\"discussion\">Show All (15)<\/option>\n<option value=\"discussion_answered\">Answered Questions (15)<\/option>\n<option value=\"discussion_unanswered\">Unanswered Questions (0)<\/option><\/select>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n"));
  Newsfeed.prepare_for_newsfeed_sort($(".newsfeed-sorting-panel"));
});

Googlebot has decided that it wants to see if there is any interesting HTML at /newsfeeds/61?order=activity&amp;type=discussion. So it attempts to crawl that URL requesting HTML, and my app reports an error. "ActionView::MissingTemplate: Missing template newsfeeds/show..."

  1. why is Googlebot trying to crawl this URL? Just because it thinks there's a chance there is something interesting there and it tries to crawl everything? Or because of something wrong in my code?
  2. what's the best way to deal with this in Rails? I don't want to ignore all MissingTemplate errors because there might be cases that signal something truly wrong down the road. Same thing with ignoring errors created by bots. Do I have any other options?

Upvotes: 3

Views: 570

Answers (2)

xkothe
xkothe

Reputation: 674

There is nothing wrong with bots trying to find new links in your page. They are doing their job.

Maybe you can use one of these metatags in your view: Is there a way to make robots ignore certain text?

These metas say to googlebot "dont look here"

<!--googleoff: all-->

$.getJSON("/newsfeeds/61?order=activity&amp;type=discussion", function(response) {
$(".discussion-post-stream").replaceWith($(response.newsfeed_html));
$(".stream-posts").before($("<div class=\'newsfeed-sorting-panel generic-12\' data-id=\'61\'>\n<div class=\'newsfeed-type-menu generic-12\'>\n<ul class=\'newsfeed-sorting-buttons\'>\n<li>\n<span>\nShow\n<\/span>\n<\/li>\n<li>\n<select id=\"type\" name=\"type\"><option selected=\"selected\" value=\"discussion\">Show All (15)<\/option>\n<option value=\"discussion_answered\">Answered Questions (15)<\/option>\n<option value=\"discussion_unanswered\">Unanswered Questions (0)<\/option><\/select>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n"));
Newsfeed.prepare_for_newsfeed_sort($(".newsfeed-sorting-panel"));
});

<!--googleon: all>

Upvotes: 1

yolabingo
yolabingo

Reputation: 108

Presumably it parsed that URL from the page source, and is just trying to crawl your site.

Best to tell Google what to crawl/not crawl with a sitemap.xml file for your site and a robots.txt file.

You can tell Googlebot not to crawl pages with these (or any) GET parameters in robots.txt:

Disallow: /*?

Upvotes: 1

Related Questions