Tilak Madichetti
Tilak Madichetti

Reputation: 4346

How to know if it's actually a 404 page?

What I learned from Foregenix:

The HTTP 404 Not Found Error means that the webpage you were trying to reach could not be found on the server. It is a Client-side Error which means that either the page has been removed or moved and the URL was not changed accordingly, or that you typed in the URL incorrectly

But then I also do web app pentests with Python and I am wondering that if I only check for the String 404 on the page, it may not really be a 404 error. It can so happen that the page exists but the heading is 404 just to fool us.

So how exactly do I find out?

Upvotes: 12

Views: 5915

Answers (3)

Bèr Kessels
Bèr Kessels

Reputation:

In addition to Anders' answer, I found a way to detect some cases where 404 is misused with a Timing attack. It is hardly reliable, though.

  • Send 404 instead of 403, to hide the resource that requires authentication.

Often servers need more time to determine that "you dont have authorization to get this resource", because they need more roundtrips to external resources like databases, then they need to determine "this is not there", quite often even cacheable and quickly to determine.

A typical example in an MVC application with a RDBS as backend is the difference between a simpleSELECT COUNT(id) FROM articles WHERE id=123 LIMIT 1 and the much more complex SELECT access FROM accesses JOIN articles ON articles.id = accesses.foreign_id WHERE articles.id = 123 AND accesses.type='articles' AND accesses.user_id = (SELECT id FROM users WHERE token='t0k3n' LIMIT 1). And that implies that the application can make such single line queries in the first place: more often it is a lot of "fetch a user, extract some data, now fetch a Thing, now ask Thing if user may access it through an authorization-api".

Unless the developers or the framework of the site took care to cover this case, quite often you'll see a notable difference in time to serve both cases of 404.

  • Send 404 instead of 500, to hide the fact something is not working.

Typically, crashing or unexpected errors occur only after some code has ran. 404-detection often comes early: after all, it is cheap to determine that something is not there (see above). Whereas the error would occur later on. Meaning that such a 500-hidden-as-404-error would, quite often take a lot longer to reach you then a normal 404.

  • Send 404 when your IP is blocked for some reason.

Here, the timing is often the other way around, depending on the implementation. Such IP-blocking would often be kept outside of the web-app (CMS etc) because it is much simpler and performant to handle higher up in the stack: the webserver, a proxy etc. However, when the application itself takes care of this, generating an actual 404 is often reasonably cheap, whereas looking an IP in a database, applying masks and so on, takes some time. Similar to hiding a 403 as 404.

Upvotes: 4

Anders
Anders

Reputation: 8577

You can check the HTTP status code, and see if it is 404 or not. The status code is on the first line of the response:

HTTP/1.1 404 Not Found

If you are using HTTPlib you can just read the status property of the HTTPResponse object.

However, it is the server that decides what HTTP status code to send. Just because 404 is defined to mean "page not found" does not mean the server can not lie to you. It is quite common to do things like this:

  • Send 404 instead of 403, to hide the resource that requires authentication.
  • Send 404 instead of 500, to hide the fact something is not working.
  • Send 404 when your IP is blocked for some reason.

Without access to the server, it is impossible to know what is really going on behind the curtains.

Upvotes: 55

A. Darwin
A. Darwin

Reputation: 260

You are right: someone could write "404 Page Not Found" in a HTML page and make you think that the page doesn't exist.

In order to properly recognize HTTP status codes such as the 404, you should capture the HTTP response with Python and parse it. HTTP 1 and HTTP 2 standards dictate that an HTTP response, which is written in the HTTP generic message format, must contain the status code.

Example of an HTTP response (from Tutorials Point):

HTTP/1.1 404 Not Found
Date: Sun, 18 Oct 2012 10:36:20 GMT
Server: Apache/2.2.14 (Win32)
Content-Length: 230
Connection: Closed
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>404 Not Found</title>
</head>
<body>
  <h1>Not Found</h1>
   <p>The requested URL /t.html was not found on this server.</p>
</body>
</html>

You should definitely not trust the HTML part, which can show a 404 error (or even a 418 I'm a teapot) when the page can in fact be found.

Upvotes: 9

Related Questions