Casady
Casady

Reputation: 1456

Why does this web server return code 404 for Indy, but code 200 for every browser?

I have one URLs that works just fine in all browsers (5 tested on 2 computers), but if I try to get the page content using Get() of the Indy Http client, it returns error code 404, page not found. This is with the latest Indy SVN build (4985).

Why does this web server return code 404 for Indy, but code 200 for every browser?

I suspect this may be a bug in Indy because of the "#" character in the URL (Indy cuts everything off after #). If so, is there any way to work-around this. Maybe replace the # char with escape code?

Here is my example code. All that is needed for this is Delphi with Indy components and a form with a button and a memo.

procedure TForm1.Button1Click(Sender: TObject);
var HTTPCLIENT1: TIdHTTP;
begin
  try
   try
     HTTPCLIENT1 := TIdHTTP.Create(nil);
     Memo1.Clear;
     with HTTPCLIENT1 do
     begin
          HandleRedirects := True;
          Request.UserAgent   := 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31';
          Memo1.Text := Get('http://www.visionofhumanity.org/gpi-data/#/2011/scor/');
          Caption := ResponseText;
     end;
   except
     On e: Exception do
     begin
          Memo1.Lines.Add('Exception: '+e.Message);
     end;
   end;
  finally
     HTTPCLIENT1.Free;
  end;
end;

Upvotes: 0

Views: 1272

Answers (2)

Rob Kennedy
Rob Kennedy

Reputation: 163277

Your suspicion is correct. You've included the # section of the address in your request. Browsers don't do that because that section is reserved for in-page navigation. The server doesn't know that, so it tries to fetch the resource that corresponds to the full URL you gave it, including the # and everything afterward. Nothing matches, so it fails with status 404.

Either do as the browsers do and strip that section from the URL prior to sending the request to the server, or update Indy to revision 4987 so that it will happen automatically. Merely escaping the character will continue to yield status 404.

Upvotes: 3

Remy Lebeau
Remy Lebeau

Reputation: 596256

# is a reserved character in URLs. If you want to use reserved characters inside of a URL, you need to url-encode them. TIdHTTP does not do that for you. It requires you to pass in an encoded URL, but you are passing in an unencoded URL instead. Since # is unencoded, it gets treated as an anchor and stripped off, so you are actually requesting http://www.visionofhumanity.org/gpi-data/, hense the 404 reply.

# is url-encoded as %23, so use this:

Memo1.Text := Get('http://www.visionofhumanity.org/gpi-data/%23/2011/scor/');

Or this:

Memo1.Text := Get(TIdURI.URLEncode('http://www.visionofhumanity.org/gpi-data/#/2011/scor/'));

Update: I tracked down the problem. It is another TIdURI parsing bug, this time related to having a / character after the # character. TIdURI checks for / characters before it checks for a # character, so the anchor portion of the URL was ending up in the TIdURI.Path property (previously it was ending up in the TIdURI.Params property) and thus submitted to the server. I have checked in a new fix (SVN rev 4987).

Upvotes: 3

Related Questions