Reputation: 72514
I have to convert a large legacy application to Delphi 2009 which uses strings, AnsiStrings, WideStrings and UTF8 data all over the place and I have a hard time to understand how the new string types work and how they should be used.
The application fully supported Unicode using TntUnicodeControls and there are 3rd party DLLs which require strings in specific encodings, mostly UTF8 and UTF16, making the conversion task not as trivial as one would suspect.
I especially have problems with the C DLL calls and choosing the right type. I also get the impression that there are many implicit string conversions happening, because one of the DLL seems to always receive UTF-8 encoded strings, no matter how the Delphi string is encoded.
Can someone please provide a short overview about the new Delphi 2009 string types UnicodeString and RawByteString, perhaps some usage hints and possible pitfalls when converting a pre 2009 application?
Upvotes: 7
Views: 7798
Reputation: 11
Another thing to watch out for when passing string between dlls built with different versions of Delphi or C++ Builder is that, starting with 2009, the StrRec part of AnsiStringBase gained two extra fields; codePage and elemSize. They are 2 bytes each (short ints), so the size of StrRec is now 12 bytes instead of 8. This can cause invalid pointer exception problems with memory allocation and destruction, even when the data part of the string seems to transfer ok.
Upvotes: 0
Reputation: 72514
It seems almost all my problems come from the automatic conversion on assignments to UTF8String
.
I already had old code using UTF8String
just to help me think which type of string a variable should contain.
When starting to port my application, I replaced AnsiString
with UTF8String
for the same reason, but the code depended on UTF8String
being just an alias to (classic) AnsiString
Now with the automatic conversion that assumption is no longer true, which created many problems.
Be careful if you use UTF8String
when porting from pre-2009 Delphi code!
Upvotes: 0
Reputation: 26356
Note that it does not only hit real string code. It also hits code where PCHAR is used to trawl through buffers, or interface with APIs.
E.g. initialization code of headers that load the DLL dynamically (getprocedureaddress/loadlibray)
Upvotes: 0
Reputation: 24483
Watch my CodeRage 4 talk on "Using Unicode and Other Encodings in your Programs" this friday, or wait until the replay of it is available online.
I'm going to cover some encodings and explain about the string format.
The slides will be available shortly (I'll try to get them online today) and contain a lot of references to stuff you should read on the internet (but I must admit I forgot the link to Joel on Unicode that eed3si9n posted).
Will edit this answer today with the uploads and the links.
Edit:
If you have a small sample where you can show that your C/C++ DLL receives the strings UTF8 encoded, but thought they should be encoded otherwise, please post it (mail me; almost anything at the pluimers dot com gets to me, especially if you use my first name before the at sign).
Session materials can be downloaded now, including the "Using Unicode and Other Encodings in your Programs" session.
These are links from that session:
Read these:
Relevant on-line help topics:
Hope this gets you going. If not, mail me and I'll try to extend the answer here.
Upvotes: 8
Reputation: 95624
See Delphi and Unicode, a white paper written by Marco Cantù and I guess The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), written by Joel.
One pitfall is that the default Win32 API call has been mapped to use the W (wide string) version instead of the A (ANSI) version, for example ShellExecuteA
If your code is doing tricky pointer code assuming internal layout of AnsiString
, it will break. A fallback is to substitute PChar
with PAnsiChar
, Char
with AnsiChar
, string
with AnsiString
, and append A at the end of Win32 API call for that portion of code. After the code actually compiles and runs normally, you could refactor your code to use string
(UnicodeString
).
Upvotes: 11