Reputation: 10372
A strange web page crossed my way. (And being a developer I have to solve the mystery.)
When accessing the web page in any browser, all seems normal. The web page is displayed as expected.
But when looking in the console the server acually returns a 404 status code:
So why is the browser rendering a page?
Looking at the Body shows valid HTML is returned:
Hold on. Responding 404 and sending the HTML along the way? And the browser renders it??
Why is this happening? Is this some server misconfiguration? Or is something clever going on here that I don't understand? Is there a practical reason for configuring a server on purpose to behave like this?
Upvotes: 5
Views: 8068
Reputation: 1
I was having an identical issue on one of my sites and found this question here, after roughly 10 hours debugging it turns out to be a combination of .htaccess and a PHP routing script. There are no static .html pages on my site, everything is built on demand and normally works ok but one directory was behaving strangely with 'Chrome Lighthouse' scores, turns out everything was being served with a 404 from that directory although pages looked and functioned as expected.
The reason being my .htaccess was misconfigued and not routing that directory directly to the PHP script, there was however an 'ErrorDocument 404' setup which inadvertently was stepping in. As that document name DID hit the pattern for the PHP routing script the page was then built successfully yet the 404 remained. As with most config issues a one char fix was needed, in my case adding a '-' to the RewriteRule regex pattern. No more 404 and Lighthouse all green.
Upvotes: 0
Reputation: 1024
Note: I don't know whether this issue comes from me not being strict enough in the .htaccess or my CMS.
In my contrived .htaccess example I had the following rules to ignore these directories from being handled by the CMS.
RewriteCond $1 !^(branch|css|js|html|images) [NC]
I also had a branches directory inside my CMS' templates (created within CMS). I guess my .htaccess rule wasn't strict enough here. I had to change branch
to branch\/
, like so:
RewriteCond $1 !^(branch\/|css|js|html|images) [NC]
Only then would the page load without the 404 in the console.
Upvotes: 0
Reputation: 113
I faced with the same situation. My portal was hosted in a tomcat server. The portal was loaded when the host name along with the tomcat directory path was hit. But on loading the the webpage redirected to a deep-link URL and rendered the page. But if you hit the deep link URL directly in the browser it would give you 404 error in the network tab in Dev tools although the webpage would be rendered fine. This happens because there is no resource as your deep-link URL anywhere in your server config files, so when it searches for the resource it doesn't find one and returns 404 in the network tab in Dev tools. But browser behaves differently with the resource URL. It first loads and connects to the host name of the resource, when returned with success gets redirected as per the config files settings and renders the deep-link URL resource HTML, styling contents properly.
Upvotes: 0
Reputation: 10372
Another answer on Stack Overflow contains some interesting information: A HTTP status code of 404 plus HTML response body is actually recommended by the spec.
The 4xx class of status code is intended for cases in which the
client seems to have erred. Except when responding to a HEAD
request, the server SHOULD include a representation containing an
explanation of the error situation, and whether it is a temporary or
permanent condition. These status codes are applicable to any
request method. User agents SHOULD display any included
representation to the user.
This leaves me with two possible explanations:
Explanation 1: it's a server error.
Explanation 2: it's done on purpose to defeat crawlers and page watchers.
The second one would indeed be kind of clever if you don't want your page to be indexed.
Upvotes: 3