steveo225
steveo225

Reputation: 11882

Properly handling a 404 File not found request

I have a custom 404 page that has 2 main goals:

  1. Log 404 errors so I can fix broken links and find evil people searching for exploits
  2. Redirect to proper location with a 301 Moved Permanently for pages that have actually moved

Everything else is just redirected to the main page. The problem I am having is with bots. Google is the worst, they keep trying, every few days, to crawl pages that don't exist. I have even tried adding the pages as disallow to my robots.txt, but they ignore it for some reason. And the pages still turn up in their search results!

I'd like to fix this properly, so I am looking for suggestions. Note, this is a paid webhost, so changing webserver settings is probably not an option. The webserver is running Windows with IIS 7.

Some issues I am having:

If I detect Googlebot (and a few other main bots) and manually send a 404 status code, the webserver traps that and tries to re-execute the custom 404 page and I get into an infinite loop.

If I have the page print a message, it responds with a status code of 200.

Upvotes: 2

Views: 187

Answers (1)

472084
472084

Reputation: 17886

You should have a look into https://www.google.com/webmasters/

The files the bots are trying to index must have once existed or linked to, with the link above Google will tell you what pages its requested and how many of them were 404.

Your robots.txt must be incorrect for Google to ignore it, as they definitely follow the rules, otherwise they would get in a lot of trouble.

You can make sure Google actually is using the latest robots.txt as well using the link, it will let you know if there are any problems with it.

To redirect with a 301 tag, you can simply do this:

Header( "HTTP/1.1 301 Moved Permanently" ); 
Header( "Location: http://website.com" );

You will just need to insert all the relevant info into your database first.

Upvotes: 3

Related Questions