Mike Jablonski
Mike Jablonski

Reputation: 1765

What determines if a variable of type UnicodeString represents a Unicode string or an ANSI string?

I'm experienced with Delphi but new to Unicode.

The embedded Delphi XE2 documentation about UnicodeString (System.UnicodeString) says:

"Delphi utilizes several string types. UnicodeString can contain both Unicode and ANSI strings.

Support for this type includes the following features:

Strings as large as available memory. Efficient use of memory through shared references. Routines and operators that evaluate strings based on the current locale. Despite its name, UnicodeString can represent both ANSI character set strings and Unicode strings. "

I don't understand what is meant by the word "can." ("It can contain both Unicode and ANSI." ... "Despite its name, UnicodeString can represent both ANSI character set strings and Unicode strings.")

My question: what determines if a variable of type UnicodeString represents a Unicode string or an ANSI string?

Upvotes: 0

Views: 432

Answers (1)

Remy Lebeau
Remy Lebeau

Reputation: 598309

The documentation is outdated. UnicodeString in XE2 can only contain Unicode data.

In CB2009 and D2009, when UnicodeString was first introduced, there were cases, mostly in C++<->Delphi interactions, where the RTL allowed Ansi data to be stored in a UnicodeString and Unicode data to be stored in an AnsiString to help users migrate legacy Ansi code to Unicode. UnicodeString and AnsiString do have a unified internal structure, and the Delphi compiler had a {$STRINGCHECKS} directive that would detect any discrepancies and perform silent data conversions when needed. Although it did work, it also had subtle side effects if you were not careful with it.

By the time XE was released, Embarcadero figured users had had enough time to migrate, so the {$STRINGCHECKS} directive and supporting RTL functionality was removed. UnicodeString and AnsiString still have a unified internal structure, so it is technically possible to store Ansi data in a UnicodeString and Unicode in an AnsiString, but you would have to directly manipulate memory to do it manually, the compiler/RTL will not do it in "normal" code, and will not perform silent conversions anymore when discrepancies exist, so data corruption and/or crashes can occur if you are not careful.

Upvotes: 3

Related Questions