Herman Badenhorst
Herman Badenhorst

Reputation: 21

Deserialize Json for multi-language support

I have tried de-serializing with suggested code in this forum.

I use a Google Translate API. It returns a JSON string.
I use Newtonsoft.Json to de-serialize.

My code does not work for foreign language translations, where the string to deserialise is more than one byte.

The code is shown below:

Public Function getGoogleTranslate(myIncomingText As String) As String
    Dim myUrlString As String
    Dim myLanguageFrom As String
    Dim myLanguageTo as string
    Dim myTextFrom As String
    Dim myNewString As String

    myLanguageFrom = "en"
    myLanguageTo = "fr"
    myTextFrom = myIncomingText

    myUrlString = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=auto&tl="
    myUrlString &= myLanguageTo
    myUrlString &= "&hl="
    myUrlString &= myLanguageFrom
    myUrlString &= "&dt=t&dt=bd&dj=1&source=icon&q="
    myUrlString &= myIncomingText

    Dim myWebClient As New System.Net.WebClient
    Dim myDowloadString As String = myWebClient.DownloadString(myUrlString)
    Dim myJsonFile As Newtonsoft.Json.Linq.JObject = Newtonsoft.Json.Linq.JObject.Parse(myDowloadString)
    myNewString = myJsonFile.SelectToken("sentences[0]").SelectToken("trans")

    Return myNewString
End Function

Everything works. I call the API using the URL specified as "MyURLString".
The returned line is returned into the string myDownloadString.
Investigation of this string is that the translated text is returned correctly.
After the resulting string is parsed, only characters in the ASCII range are decoded as expected, not characters used in other languages.
It could be the myNewString variable, which is declared as a standard string.
However, if I cut and paste the translated text into a TextBox on the web page, the special characters are accepted and stored in a SQL table correctly.
It gets even more complex when translation to "ru" (Russian) or "zh" (simplified Chinese).

I have never worked with a different language character set. So I am flying blind, with only this forum for help.

Upvotes: 1

Views: 1157

Answers (3)

Herman Badenhorst
Herman Badenhorst

Reputation: 21

The working code to call the Google translate API from VB.NET. Note: I have tested this with French and German translations but not with Chinese yet.

Public Function getGoogleTranslate(myIncomingText As String) As String

Dim myUrlString As String
Dim myLanguageFrom As String
Dim myLanguageTo as string
Dim myTextFrom As String
Dim myNewString As String

myLanguageFrom = "en"
myLanguageTo = "fr"
myTextFrom = myIncomingText

myUrlString = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=auto&tl="
myUrlString &= myLanguageTo
myUrlString &= "&hl="
myUrlString &= myLanguageFrom
myUrlString &= "&dt=t&dt=bd&dj=1&source=icon&q="
myUrlString &= myIncomingText

Dim myWebClient As New System.Net.WebClient
myWebClient.Encoding = System.Text.UTF8Encoding.UTF8
Dim myDowloadString As String = myWebClient.DownloadString(myUrlString)
Dim myJsonFile As Newtonsoft.Json.Linq.JObject = Newtonsoft.Json.Linq.JObject.Parse(myDowloadString)
myNewString = myJsonFile.SelectToken("sentences[0]").SelectToken("trans")

Return myNewString

End Function

Upvotes: 0

Herman Badenhorst
Herman Badenhorst

Reputation: 21

I have managed to crack this. It was not a JSON problem after all. The problem was in the programmer. I added one line of code to my solution:

myWebClient.Encoding = System.Text.UTF8Encoding.UTF8

NewtonSoft does handle the parse correctly, even if the incoming string is in UTF8.

Thanks for your answer Jimi.

Upvotes: 0

Jimi
Jimi

Reputation: 32248

The data you're downloading is UTF8 encoded. You can decode it using Encoding.UTF8.GetString(), downloading the results as a byte array, using the DownloadData() method instead of DonloadString().

' [...]

Dim data As Byte()
Dim jsonResult as String = String.Empty

Using client As New WebClient()
    data = client.DownloadData(myUrlString)
    jsonResult = Encoding.UTF8.GetString(data)
End Using

 ' Deserialize the jsonResult object

Unfortunately, WebClient doesn't care at all about the encoding of the incoming string, it always uses the Local encoding, unless otherwise specified, setting its Encoding Property. Or using other means, as shown here. You need to know what the encoding is beforehand, though. Or you could read the Encoding from the underlying WebResponse object.

Option 2: use HttpClient instead of WebClient. This class handles the encoding specified by the remote source:

' Declare a static (`Shared`) HttpClient object as a Field
Private Shared client As New HttpClient()

' Make an async method:
Private async Function GetGoogleTranslation(textToTranslate As String) As String
    ' Declare your local variables
    ' [...]
   
    Dim jsonResult as String = String.Empty
    Using response = Await client.GetAsync(myUrlString)
        If response.IsSuccessStatusCode Then
            jsonResult = Await response.Content.ReadAsStringAsync()
        End If
    End Using
    ' Deserialize and get the results you need
    Dim result = [...] ' Deserialize the result
    Return result
End Function

Upvotes: 1

Related Questions