Franz
Franz

Reputation: 2043

Encode ASCII Files

this question will have a very simple answers which is yes or no I guess ?

If I encode from x64 bit unicode delphi app my stringlist like this

StringList.SaveToFile(FileName, TEncoding.ASCII);

is there any other limitation , difference in file layout while writing this file with the statement

StringList.SaveToFile(FileName);

or

StringList.SaveToFile(FileName, TEncoding.UTF8);

I'm afraid on line length and control char issues between both versions....Answer NO will make me happy.

Upvotes: 0

Views: 2396

Answers (2)

David Heffernan
David Heffernan

Reputation: 612954

The difference is simply in the encoding used. This in turn, of course, leads to differences in size. So ASCII files will be smaller than UTF-16 (what you get with TEncoding.Unicode. And UTF-8 files could be the same size as ASCII, or larger than UTF-16.

I guess you are asking if using ASCII or UTF-8 in any way damages the text that is written. Well, using ASCII will if the text contains non-ASCII characters. ASCII can only encode 127 characters.

On the other hand, UTF-8 is a full encoding of Unicode. Which means that

StringList.SaveToFile(FileName, TEncoding.UTF8);
StringList.LoadFromFile(FileName, TEncoding.UTF8);

results in the list having exactly the same content as it did before the save.

You ask if lines can be truncated by SaveToFile. They cannot.

Another point to make is that 32/64 bit is not relevant here. The code behaves in exactly the same way under 32 and 64 bit. The issues are always to do with encoding.

I would also note that the title of your question is somewhat mis-leading. When you encode with TEncoding.UTF8 you not do not have an ASCII file.

Upvotes: 2

Chris Rolliston
Chris Rolliston

Reputation: 4808

UTF-8 and the Windows 'Ansi' codepages are all superset of ASCII. As such, if the string list only contains characters in the ASCII range, the three statements you listed will be equivalent if you prepend the last with this:

StringList.WriteBOM := False;

This is because by default, TStrings will write out a small marker (a BOM) to denote UTF-8 text.

Upvotes: 3

Related Questions