Steve Payne
Steve Payne

Reputation: 633

Facebook share links The document returned no data. 403

Since this morning, my website has started failing with 403 error codes when anybody attempts to share a link to my website.

When I use the Facebook debugger tool, it is suggesting 'This response code could be due to a robots.txt block. Please allowlist facebookexternalhit on your sites robots.txt config to utilize Facebook scraping'

https://developers.facebook.com/tools/debug/?q=https%3A%2F%2Fwww.infinitesweeps.com%2Fsweepstake%2F280358-Sierra-Nevada-Game-Day-Giveaway.html

I have a whitelist override rule setup in Cloudflare and show the Facebook bot coming to the page every time I attempt to update the last scraped time and receiving a '200 OK' response from a Cached Page (meaning that Cloudflare is sending Facebook a Cached page of the link).

But for some reason, Facebook is saying 'The document returned no data.'

https://developers.facebook.com/tools/debug/echo/?q=https%3A%2F%2Fwww.infinitesweeps.com%2Fsweepstake%2F280358-Sierra-Nevada-Game-Day-Giveaway.htmlopers

All of my 403 pages for the website return data (an HTML page saying that you re blocked or error etc). Basically, nothing should return blank.

Is this happening for anybody else? I have tried debugging it all morning and I am sure that Facebook bot is correctly accessing my website (receiving a 200 OK from a Cached page that other users are also receiving fine) - yet it's showing these errors.

Where can I begin to fix this or how?

I have updated my robots.txt file to specifically allow the Facebook bot (and Google bot and Twitter).

https://www.infinitesweeps.com/robots.txt

Upvotes: 1

Views: 576

Answers (2)

Aad van Yperen
Aad van Yperen

Reputation: 1

I had the same issue. Changing the robots.txt file didn't help for me. In the errorlog I found that crawling was blocked on the server site. I contacted the support of my hosting company and got the following response:

The Facebook crawler has been temporarily blocked for some servers because it showed a very aggressive crawling behavior (more than 100,000 hits per day to multiple websites). This was not only the case with our servers, but also with servers from other hosting companies. We contacted Facebook about this, but unfortunately none received a response to our email.

The Facebook crawler now has access again and we are keeping some things keep an eye on it.

After this, all works fine for me.

Upvotes: 0

Steve Payne
Steve Payne

Reputation: 633

The issue resolved it's self for me after almost 12 hours. The only thing I really did was update my robots.txt file to include:

User-agent: Twitterbot
Allow: /

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: facebookexternalhit
Allow: /

User-agent: *
Disallow: /upgrade/
Disallow: /share/
Disallow: /twitter/
Disallow: /facebook/
Disallow: /sponsor/
Disallow: /paypal/
Disallow: /forum/profile/
Disallow: /login.php?*
Allow: /

I previously had

User-agent: *
Disallow: /upgrade/
Disallow: /share/
Disallow: /twitter/
Disallow: /facebook/
Disallow: /sponsor/
Disallow: /paypal/
Disallow: /forum/profile/
Disallow: /login.php?*

Hope this works for you if you're having the same problem! Took about 1 hour.

Upvotes: 0

Related Questions