CapNemo101
CapNemo101

Reputation: 75

Delphi - converting string back from UTF-8

I am having a problem converting a UTF-8 encoded string back into something usable by delphi. The application is written in XE8 and is being deployed on windows and OSX. The application uses the LimeLM API dll and dylib libraries on windows and OSX respectively. Everything works fine on windows, the issue I have is converting strings returned from the dylib library on OSX. I appreciate that all strings in and out of dylib need to be UTF-8 encoded. The limeLM function returns a PWideChar value which I assume will be UTF- encoded. But it doesnt matter which function I use to try and convert the value into something useable in Delphi, all I get is garbage.

Here is the function :

class function TurboActivate.GetFeatureValue(featureName: String): String;
var
    value : PWideChar;
    FieldName : PWideChar;
    tmpStr : String;
begin

    {$IFDEF MSWINDOWS}
    FieldName := PwideChar(featureName);
    {$ENDIF}
    {$IFDEF MACOS}
    FieldName := PWideChar(UTF8Encode(featureName));
    {$ENDIF}


    value := GetFeatureValue(FieldName, nil);

    if (value = '') then
    begin
        raise ETurboActivateException.Create('Failed to get feature value.  the feature doesn''t exist.');
    end;
    {$IFDEF MSWINDOWS}
    Result := value;
    {$ENDIF}
    {$IFDEF MACOS}
    tmpStr :=  UTF8ToString(value);
    ShowMessage(tmpStr);
    tmpStr :=  UTF8ToWideString(value);
    ShowMessage(tmpStr);
    tmpStr :=  UTF8ToUnicodeString(value);
    ShowMessage(tmpStr);
    tmpStr :=  UTF8ToAnsi(value);
    ShowMessage(tmpStr);

    Result := TmpStr;
    {$ENDIF}

end; 

There is definitely a value to decode, value = '散汤湡獤杀潯汧浥楡⹬潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4

but tmpStr always contains '??????????c??????/'

Any help would be gratefully appreciated.

Upvotes: 4

Views: 4004

Answers (1)

David Heffernan
David Heffernan

Reputation: 612794

value = '散汤湡獤杀潯汧浥楡⹬潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4

This is indicative of you interpreting 8 bit text, presumably UTF-8 encoded, as if it were UTF-16 encoded. As a broad rule, when you see a UTF-16 string with Chinese characters, either it is a correctly interpreted Chinese text, or it is mis-interpreted 8 bit text.

When you interpret that text correctly as UTF-8 it is:

[email protected] 4CSA-7GFJ-YMW4-2VTF-II5Q-BNTA♥♦

I obtained that with this code:

  Writeln(TEncoding.UTF8.GetString(
    TEncoding.Unicode.GetBytes('散汤湡獤杀潯汧浥楡⹬潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4)));

Do note however, that if you look at the byte array returned by TEncoding.Unicode.GetBytes('散汤湡獤杀潯汧浥楡⹬潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4) then you will see that it contains a null. So actually the string is null-terminated after the e-mail address.

The problems start here:

value : PWideChar;
....
value := GetFeatureValue(FieldName, nil);

In fact GetFeatureValue returns PAnsiChar. And the payload is UTF-8 encoded, assuming I am interpreting you correctly.

So you need to make the following changes:

  1. Change the return type of GetFeatureValue to be PAnsiChar.
  2. Change the type of value to be PAnsiChar.
  3. Convert value to a string using UnicodeFromLocaleChars or TEncoding.GetString.

That might look like this:

var
  Bytes: TBytes;
....
SetLength(Bytes, StrLen(value));
Move(value^, Pointer(Bytes)^, Length(Bytes));
str := TEncoding.UTF8.GetString(Bytes);

Now, for the data in the question that sets str to [email protected]. As mentioned above, the data contains a null-terminator which is failing to terminate the string when it is erroneously interpreted as UTF-16. That is, the text 4CSA-7GFJ-YMW4-2VTF-II5Q-BNTA♥♦ comes from a buffer overrun.

Upvotes: 8

Related Questions