Reputation: 36654
This code starts a HTTP server which listens for requests on port 8080. When compiled with Delphi 2009, the Chinese text is rendered correctly. With Free Pascal 2.6.0 however, the browser displays ä¸æ–‡
instead of 中文
.
What is the correct way to write Unicode / UTF-8 HTTP responses with Indy and Free Pascal?
program IdHTTPUnicode;
{$APPTYPE CONSOLE}
uses
IdHTTPServer, IdCustomHTTPServer, IdContext, IdSocketHandle, IdGlobal,
SysUtils;
type
TMyServer = class (TIdHTTPServer)
public
procedure InitComponent; override;
procedure DoCommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo;
AResponseInfo: TIdHTTPResponseInfo); override;
end;
procedure Demo;
var
Server: TMyServer;
begin
Server := TMyServer.Create(nil);
try
try
Server.Active := True;
except
on E: Exception do
begin
WriteLn(E.ClassName + ' ' + E.Message);
end;
end;
WriteLn('Hit any key to terminate.');
ReadLn;
finally
Server.Free;
end;
end;
procedure TMyServer.InitComponent;
var
Binding: TIdSocketHandle;
begin
inherited;
Bindings.Clear;
Binding := Bindings.Add;
Binding.IP := '127.0.0.1';
Binding.Port := 8080;
Binding.IPVersion := Id_IPv4;
end;
procedure TMyServer.DoCommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
const
UNI = '中文';
begin
AResponseInfo.ContentText := '<html>' + UNI + '</html>';
AResponseInfo.ContentType := 'text/html';
AResponseInfo.CharSet := 'UTF-8';
end;
begin
Demo;
end.
In the debugger, I can see that different code in the method TIdIOHandler.Write is executed, for Free Pascal, STRING_IS_ANSI is defined:
procedure TIdIOHandler.Write(const AOut: string; AByteEncoding: TIdTextEncoding = nil
{$IFDEF STRING_IS_ANSI}; ASrcEncoding: TIdTextEncoding = nil{$ENDIF}
);
begin
if AOut <> '' then begin
AByteEncoding := iif(AByteEncoding, FDefStringEncoding);
{$IFDEF STRING_IS_ANSI}
ASrcEncoding := iif(ASrcEncoding, FDefAnsiEncoding, encOSDefault);
{$ENDIF}
Write(
ToBytes(AOut, -1, 1, AByteEncoding
{$IFDEF STRING_IS_ANSI}, ASrcEncoding{$ENDIF}
)
);
end;
end;
Upvotes: 2
Views: 5688
Reputation: 596206
FreePascal strings are not UTF-16 encoded like they are in Delphi 2009+. In FreePascal, and in Delphi 2007 and earlier, your code needs to take the actual string encoding into account. That is why Indy exposes additional Ansi-based parameters/properties for those platforms.
When TIdHTTPServer
writes out the ContentText
using TIdIOHandler.Write()
, the ASrcEncoding
parameter is not used on non-Unicode platforms, so you will have to use the TIdIOHandler.DefAnsiEncoding
property instead to let Write()
know what the encoding of the ContentText
is, eg:
procedure TMyServer.DoCommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
const
UNI: WideString = '中文';
begin
AResponseInfo.ContentText := UTF8Encode('<html>' + UNI + '</html>');
AResponseInfo.ContentType := 'text/html';
// this tells TIdHTTPServer what to encode bytes to during socket transmission
AResponseInfo.CharSet := 'utf-8';
// this tells TIdHTTPServer what encoding the ContentText is using
// so it can be decoded to Unicode prior to then being charset-encoded
// for output. If the input and output encodings are the same, the
// Ansi string data gets transmitted as-is without decoding/reencoding...
AContext.Connection.IOHandler.DefAnsiEncoding := IndyUTF8Encoding;
end;
Or, more generically:
{$I IdCompilerDefines.inc}
procedure TMyServer.DoCommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
const
UNI{$IFNDEF STRING_IS_UNICODE}: WideString{$ENDIF} = '中文';
begin
{$IFDEF STRING_IS_UNICODE}
AResponseInfo.ContentText := '<html>' + UNI + '</html>';
{$ELSE}
AResponseInfo.ContentText := UTF8Encode('<html>' + UNI + '</html>');
{$ENDIF}
AResponseInfo.ContentType := 'text/html';
AResponseInfo.CharSet := 'utf-8';
{$IFNDEF STRING_IS_UNICODE}
AContext.Connection.IOHandler.DefAnsiEncoding := IndyUTF8Encoding;
{$ENDIF}
end;
Upvotes: 5
Reputation: 16045
In modern FreePascal strings by default are UTF-8 unless you tweaked copiler options.
Thus it seems in iif(ASrcEncoding, FDefAnsiEncoding, encOSDefault);
the value of encOSDefault
is wrong.
You may fix its detection in INDY sources if you like or i guess better would be to set DefAnsiEncoding := 'utf-8';
(low-case by RFC AFAIR)
To be on safe side you can check for UTF-8 mode at the program beginning. Set some non-Latin constant (like that chinese thing, or greek or cyrillic - whatever) and check if it is UTF8 or not: http://compaspascal.blogspot.ru/2009/03/utf-8-automatic-detection.html
However overall i think you may try to find some library that cares about FPC and Linux more than Indy. Indy seems to me stagnating and next to abandoned even on Delphi. Maybe Synopse mORMot
(look for DataSnap performance tests article) can help you or some library that comes with CodeTyphon
distro.
Upvotes: 0