Reputation: 2461
It's possible to convert the XML to UTF-8 encoding in Delphi 6?
Currently that's what I am doing:
WideStringVariable = AnsiToUtf8(Doc.XML.Text);
WideStringVariable
to file using TFileStream
and Adding BOM for UTF8
at the file beggining.CODE:
Procedure SaveAsUTF8( const Name:String; Data: TStrings );
const
cUTF8 = $BFBBEF;
var
W_TXT: WideString;
fs: TFileStream;
wBOM: Integer;
begin
if TRIM(Data.Text) <> '' then begin
W_TXT:= AnsiToUTF8(Data.Text);
fs:= Tfilestream.create( Name, fmCreate );
try
wBOM := cUTF8;
fs.WriteBUffer( wBOM, sizeof(wBOM)-1);
fs.WriteBuffer( W_TXT[1], Length(W_TXT)*Sizeof( W_TXT[1] ));
finally
fs.free
end;
end;
end;
If I open the file in Notepad++ or another editor that detects encoding, it shows me UTF-8 with BOM. However, it seems like the text it's not properly encoded.
What is wrong and how can I fix it?
UPDATE: XML Properties:
XMLDoc.Version := '1.0';
XMLDoc.Encoding := 'UTF-8';
XMLDoc.StandAlone := 'yes';
Upvotes: 1
Views: 13297
Reputation: 11
Another solution:
procedure SaveAsUTF8(const Name: string; Data: TStrings);
var
fs: TFileStream;
vStreamWriter: TStreamWriter;
begin
fs := TFileStream.Create(Name, fmCreate);
try
vStreamWriter := TStreamWriter.Create(fs, TEncoding.UTF8);
try
vStreamWriter.Write(Data.Text);
finally
vStreamWriter.Free;
end;
finally
fs.free;
end;
end;
Upvotes: 1
Reputation: 16045
You can save the file using standard SaveToFile
method over the TXMLDocument
variable: http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/XMLDoc_TXMLDocument_SaveToFile.html
Whether the file would be or not UTF8 you have to check using local tools like aforementioned Notepad++ or Hex Editor or anything else.
If you insist of using intermediate string and file stream, you should use the proper variable. AnsiToUTF8
returns UTF8String
type and that is what to be used.
Compiling `WideStringVar := AnsiStringSource' would issue compiler warning and
It is a proper warning. Googling for "Delphi WideString" - or reading Delphi manuals on topic - shows that WideString
aka Microsoft OLE BSTR
keeps data in UTF-16 format. http://delphi.about.com/od/beginners/l/aa071800a.htm
Thus assignment UTF16 string <= 8-bit source
would necessarily convert data and thus dumping WideString
data can not be dumping UTF-8
text by the definition of WideString
Procedure SaveAsUTF8( const Name:String; Data: TStrings );
const
cUTF8: array [1..3] of byte = ($EF,$BB,$BF)
var
W_TXT: UTF8String;
fs: TFileStream;
Trimmed: AnsiString;
begin
Trimmed := TRIM(Data.Text);
if Trimmed <> '' then begin
W_TXT:= AnsiToUTF8(Trimmed);
fs:= TFileStream.Create( Name, fmCreate );
try
fs.WriteBuffer( cUTF8[1], sizeof(cUTF8) );
fs.WriteBuffer( W_TXT[1], Length(W_TXT)*Sizeof( W_TXT[1] ));
finally
fs.free
end;
end;
end;
BTW, this code of yours would not create even empty file if the source data was empty. It looks rather suspicious, though it is you to decide whether that is an error or not wrt the rest of your program.
The proper "uploading" of received file or stream to web is yet another issue (to be put as a separate question on Q&A site like SO), related to testing conformance with HTTP. As a foreword, you can readsome hints at WWW server reports error after POST Request by Internet Direct components in Delphi
Upvotes: 3
Reputation: 612964
You simply need to call the SaveToFile
method of the document:
XMLDoc.SaveToFile(FileName);
Since you specified the encoding already, the component will use that encoding.
This won't include a BOM, but that's generally what you want for an XML file. The content of the file will specify the encoding.
As regards your SaveAsUTF8
method, it is not needed, but it is easy to fix. And that may be instructive to you.
The problem is that you are converting to UTF-16 when you assign to a WideString
variable. You should instead put the UTF-8 text into an AnsiString
variable. Changing the type of the variable that you named W_TXT
to AnsiString
is enough.
The function might look like this:
Procedure SaveAsUTF8(const Name: string; Data: TStrings);
const
UTF8BOM: array [0..2] of AnsiChar = #$EF#$BB#$BF;
var
utf8: AnsiString;
fs: TFileStream;
begin
utf8 := AnsiToUTF8(Data.Text);
fs:= Tfilestream.create(Name, fmCreate);
try
fs.WriteBuffer(UTF8BOM, SizeOf(UTF8BOM));
fs.WriteBuffer(Pointer(utf8)^, Length(utf8));
finally
fs.free;
end;
end;
Upvotes: 2
Reputation: 1933
In order to have the correct encoding inside the document, you should set it by using the Encoding property in your XML Document, like this:
myXMLDocument.Encoding := 'UTF-8';
I hope this helps.
Upvotes: 3