Reputation: 4370
I load a text file using this code (my file encoding is UTF-8) (How to read a text file that contains 'NULL CHARACTER' in Delphi?):
uses
IOUtils;
var
s: string;
ss: TStringStream;
begin
s := TFile.ReadAllText('c:\MyFile.txt');
s := StringReplace(s, #0, '', [rfReplaceAll]); //Removes NULL CHARS
ss := TStringStream.Create(s);
try
RichEdit1.Lines.LoadFromStream(ss, TEncoding.UTF8); //UTF8
finally
ss.Free;
end;
end;
But my problem is that the RichEdit1
doesn't load the whole text.
It's not because of Null Characters. It's because of the encoding. When I run the application with this code, It loads the whole text:
uses
IOUtils;
var
s: string;
ss: TStringStream;
begin
s := TFile.ReadAllText('c:\MyFile.txt');
s := StringReplace(s, #0, '', [rfReplaceAll]); //Removes NULL CHARS
ss := TStringStream.Create(s);
try
RichEdit1.Lines.LoadFromStream(ss, TEncoding.Default);
finally
ss.Free;
end;
end;
I changed TEncoding.UTF8
to TEncoding.Default
. The whole text loaded but it's not in right format and it's not readable.
I guess there are some characters that UTF 8 doesn't support. So the loading process stops when it want to load that char.
Please Help. Any workarounds?
****EDIT:**
I'm sure its UTF-8
and it plain text. It's a HTML source file. I'm sure it has null charas I saw them using Notepad++ And the value of the Richedit.Plainext
is true
Upvotes: 3
Views: 5554
Reputation: 598174
Since you are loading an HTML file, you need to pre-parse the HTML and check if its <head>
tag contains a <meta>
tag specifying a specific charset. If it does, you must load the HTML using that charset, or else it will not decode to Unicode correctly.
If there is no charset specified in the HTML, you have to choose an appropriate charset, or ask the user. For instance, if you are downloading the HTML from a webserver, you can check if a charset is specified in the HTTP Content-Type
header, and if so then save that charset with (or even in) the HTML so you can use it later. Otherwise, the default charset for downloaded HTML is usually ISO-8859-1 unless known otherwise.
The only time you should ever load HTML as UTF-8 is if you know for a fact that the HTML is actually UTF-8 encoded. You cannot just blindly assume the HTML is UTF-8 encoded, unless you are the one who created the HTML in the first place.
From what you have described, it sounds like your HTML is not UTF-8. But it is hard to know for sure since you did not provide the file that you are trying to load.
Upvotes: 2
Reputation: 47829
You should give the encoding to TFile.ReadAllText. After that you are working with Unicode strings only and don't have to bother with UTF8 in the RichEdit.
var
s: string;
begin
s := TFile.ReadAllText('c:\MyFile.txt', TEncoding.UTF8);
// normally this shouldn't be necessary
s := StringReplace(s, #0, '', [rfReplaceAll]); //Removes NULL CHARS
RichEdit1.Lines.Text := s;
end;
Upvotes: 14