Reputation: 75
I am having a problem converting a UTF-8 encoded string back into something usable by delphi. The application is written in XE8 and is being deployed on windows and OSX. The application uses the LimeLM API dll and dylib libraries on windows and OSX respectively. Everything works fine on windows, the issue I have is converting strings returned from the dylib library on OSX. I appreciate that all strings in and out of dylib need to be UTF-8 encoded. The limeLM function returns a PWideChar value which I assume will be UTF- encoded. But it doesnt matter which function I use to try and convert the value into something useable in Delphi, all I get is garbage.
Here is the function :
class function TurboActivate.GetFeatureValue(featureName: String): String;
var
value : PWideChar;
FieldName : PWideChar;
tmpStr : String;
begin
{$IFDEF MSWINDOWS}
FieldName := PwideChar(featureName);
{$ENDIF}
{$IFDEF MACOS}
FieldName := PWideChar(UTF8Encode(featureName));
{$ENDIF}
value := GetFeatureValue(FieldName, nil);
if (value = '') then
begin
raise ETurboActivateException.Create('Failed to get feature value. the feature doesn''t exist.');
end;
{$IFDEF MSWINDOWS}
Result := value;
{$ENDIF}
{$IFDEF MACOS}
tmpStr := UTF8ToString(value);
ShowMessage(tmpStr);
tmpStr := UTF8ToWideString(value);
ShowMessage(tmpStr);
tmpStr := UTF8ToUnicodeString(value);
ShowMessage(tmpStr);
tmpStr := UTF8ToAnsi(value);
ShowMessage(tmpStr);
Result := TmpStr;
{$ENDIF}
end;
There is definitely a value to decode, value = '散汤湡獤杀潯汧浥楡潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4
but tmpStr always contains '??????????c??????/'
Any help would be gratefully appreciated.
Upvotes: 4
Views: 4004
Reputation: 612794
value = '散汤湡獤杀潯汧浥楡潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4
This is indicative of you interpreting 8 bit text, presumably UTF-8 encoded, as if it were UTF-16 encoded. As a broad rule, when you see a UTF-16 string with Chinese characters, either it is a correctly interpreted Chinese text, or it is mis-interpreted 8 bit text.
When you interpret that text correctly as UTF-8 it is:
[email protected] 4CSA-7GFJ-YMW4-2VTF-II5Q-BNTA♥♦
I obtained that with this code:
Writeln(TEncoding.UTF8.GetString(
TEncoding.Unicode.GetBytes('散汤湡獤杀潯汧浥楡潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4)));
Do note however, that if you look at the byte array returned by TEncoding.Unicode.GetBytes('散汤湡獤杀潯汧浥楡潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4)
then you will see that it contains a null. So actually the string is null-terminated after the e-mail address.
The problems start here:
value : PWideChar;
....
value := GetFeatureValue(FieldName, nil);
In fact GetFeatureValue
returns PAnsiChar
. And the payload is UTF-8 encoded, assuming I am interpreting you correctly.
So you need to make the following changes:
GetFeatureValue
to be PAnsiChar
.value
to be PAnsiChar
.value
to a string using UnicodeFromLocaleChars
or TEncoding.GetString
.That might look like this:
var
Bytes: TBytes;
....
SetLength(Bytes, StrLen(value));
Move(value^, Pointer(Bytes)^, Length(Bytes));
str := TEncoding.UTF8.GetString(Bytes);
Now, for the data in the question that sets str
to [email protected]
. As mentioned above, the data contains a null-terminator which is failing to terminate the string when it is erroneously interpreted as UTF-16. That is, the text 4CSA-7GFJ-YMW4-2VTF-II5Q-BNTA♥♦
comes from a buffer overrun.
Upvotes: 8