Reputation: 6364
I have a TIdHttpServer application. I have a simple html document with special characters:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<title>This is the title</title>
</head>
<body>
<form method="post">
<p>
<input name="name" value="Все данные по веб-сайту" />
<input type="submit" value="submit" />
</p>
</form>
</body>
</html>
I serve this page and process the post. My "Get" code is below. Problem is I am unable to decode the %hh data properly.
procedure TForm3.Get(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var
mFileName: String;
txtFile: TextFile;
begin
if ARequestInfo.Params.values['name']<>'' then begin
AssignFile( txtFile , ChangeFileExt(ParamStr(0),'.log') );
Append( TxtFile );
WriteLn(TxtFile,'Unparsed:'+ARequestInfo.UnparsedParams);
WriteLn(TxtFile,'Parsed:'+ARequestInfo.Params.values['name']);
MyDecodeAndSetParams(ARequestInfo);
WriteLn(TxtFile,'Decoded:'+ARequestInfo.Params.values['name']);
System.Close( TxtFile );
end ;
mFileName := ExtractFileDir(ParamStr(0))+'\inputform.txt';
AResponseInfo.ContentStream := TFileStream.Create(mFileName, fmOpenRead);
end;
The MyDecodeAndSetParams function:
procedure MyDecodeAndSetParams(ARequestInfo: TIdHTTPRequestInfo);
var
i, j : Integer;
value,s: string;
LEncoding: IIdTextEncoding;
begin
if IsHeaderMediaType(ARequestInfo.ContentType, 'application/x-www-form-urlencoded') then
begin
value := ARequestInfo.FormParams;
// LEncoding := CharsetToEncoding(ARequestInfo.CharSet);
if ARequestInfo.CharSet <> '' then
LEncoding := CharsetToEncoding(ARequestInfo.CharSet)
else
LEncoding := IndyTextEncoding_UTF8;
end else
begin
value := ARequestInfo.QueryParams;
LEncoding := IndyTextEncoding_UTF8;
end;
ARequestInfo.Params.BeginUpdate;
try
ARequestInfo.Params.Clear;
i := 1;
while i <= Length(value) do
begin
j := i;
while (j <= Length(value)) and (value[j] <> '&') do
begin
Inc(j);
end;
s := StringReplace(Copy(value, i, j-i), '+', ' ', [rfReplaceAll]);
ARequestInfo.Params.Add(TIdURI.URLDecode(s, LEncoding));
i := j + 1;
end;
finally
ARequestInfo.Params.EndUpdate;
end;
end;
The output in my file is as follows:
Unparsed:name=%D0%92%D1%81%D0%B5+%D0%B4%D0%B0%D0%BD%D0%BD%D1%8B%D0%B5+%D0%BF%D0%BE+%D0%B2%D0%B5%D0%B1-%D1%81%D0%B0%D0%B9%D1%82%D1%83
Parsed:οсе даннϿе по веб-сайϿϿ
Decoded:οсе даннϿе по веб-сайϿϿ
I can take the Unparsed data and decode it using this decoder and it returns the string properly:
Все данные по веб-сайту
What do I need to do so that I can properly decode the params to what they were on the form?
Upvotes: 2
Views: 3106
Reputation: 595329
If AResponseInfo.CharSet
is blank (because the client did not send a charset in the HTTP Content-Type
header), CharsetToEncoding('')
will return Indy's native 8bit charset rather than UTF-8. That is why your data is not being decoded properly.
For application/x-www-form-urlencoded
, a charset is not always sent in the HTTP headers, as the client may assume the server knows the charset to expect based on the charset it sends the HTML in. It is also possible that the client might send a charset in the posted form data instead, such as in a _charset_
field.
Try changing this:
LEncoding := CharsetToEncoding(ARequestInfo.CharSet);
To this:
if ARequestInfo.CharSet <> '' then
LEncoding := CharsetToEncoding(ARequestInfo.CharSet)
else
LEncoding := IndyTextEncoding_UTF8;
This way, you default to UTF-8 unless the client sends an explicit charset.
Update: If you are using a pre-Unicode version of Delphi (2007 or earlier), Indy uses AnsiString
instead of UnicodeString
, so TIdURI.URLDecode()
will first decode the input to Unicode using the specified AByteEncoding
parameter (defaulting to IndyTextEncoding_UTF8
if none is specified), and will then convert the Unicode data to ANSI using the specified ADestEncoding
parameter (defaulting to IndyTextEncoding_OSDefault
if none is specified).
The Russian input you have shown decodes properly to Unicode when decoded as UTF-8, but can easily lose characters (turning them into '?'
) during the conversion to ANSI if your code is running on a machine that does not use a Russian charset at the OS layer, such as ISO-8859-5 or KOI8-R.
To ensure a correct conversion, you would have to specify the desired AnsiString
encoding on those machines, eg:
var
LEncoding, LAnsiEncoding: IIdTextEncoding;
...
LEncoding := IndyTextEncoding_UTF8;
LAnsiEncoding := CharsetToEncoding('ISO-8859-5'); // or 'KOI8-R', etc
...
ARequestInfo.Params.Add(TIdURI.URLDecode(s, LEncoding, LAnsiEncoding));
In Unicode versions of Delphi (2009 and later), Indy uses UnicodeString
instead of AnsiString
, so there is no ADestEncoding
parameter present.
Upvotes: 6