Ben Guild
Ben Guild

Reputation: 5116

Go is generating unescaped control characters in JSON output due to emoji

I'm having trouble with something in Go and I'm not sure where to look. I'm fetching a UTF-8 string from a MySQL database, and attempting to return it in a JSON response to a client.

Different clients react differently, but iOS NSJSONSerialization returns an "Unescaped control character" error. This breaks the whole application. I can decode the JSON without issue in Chrome using JSON.parse(), though.

On the server-side, this same generator function written in another language besides Go works fine. Help?


EDIT: Here is the JSON that is causing the issue:

{ "test":"☮️" }

... If I omit this emoji, it works. If it's there, it doesn't work. The issue seems to be something related to there being two different encodings for certain emoji. One seems to trip up Go, but they are both valid.

To demonstrate the difference in encoding, some of the emoji show up in the database explorer and some do not:

screenshot

... These ones that appear in the database explorer are causing this issue with 100% reproducibility. However, all of them usually appear in the actual client software (not the database explorer) without issue. I don't know if there's a way to reconfigure the database connection to avoid this (or something), but it seems to work with different instances depending on what is doing the decoding and how forgiving it is. Considering that users could type or copy/paste either encoding... this needs to work consistently.

Any help would be appreciated. Thanks in advance.

Upvotes: 5

Views: 822

Answers (1)

Darigaaz
Darigaaz

Reputation: 1500

Go is doing fine.

fmt.Println([]byte("☮️"))
//[226 152 174 239 184 143]
//Yup, 1 character - 6 bytes.

NSJSONSerialization cant handle this. May be this link will be helpful NSJSONSerialization and Emoji. It's something about NSData * utf32Data = [uniText dataUsingEncoding:NSUTF32LittleEndianStringEncoding];. blah

Can you give us byte representation of "☮️" simbol in "iOS style", like i did with go?

UPD

I made some research, looks like something wrong with your database encoding. Is it UTF16?

Check this out

// it look the same, but completely different "characters"
//first one is yours, and second one is U+262E
const nihongo = "☮️☮"
for index, runeValue := range nihongo {
        fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
}
bad := []byte("☮️")
good := []byte("☮")
fmt.Printf("%v %s \n", bad, bad)
fmt.Printf("%v %s \n", good, good)

Output:

U+262E '☮' starts at byte position 0
U+FE0F '️' starts at byte position 3
U+262E '☮' starts at byte position 6
[226 152 174 239 184 143] ☮️ 
[226 152 174] ☮ 

UDP2

It just hit me! I was doing ctrl+c/ctrl+v all the way with your symbol. But it is not a single symbol! Its 2 symbols and second one is unprintable.

unprintable := []byte{239, 184, 143}
fmt.Printf("valid? %v", utf8.Valid(unprintable))
fmt.Println("full rune?", utf8.FullRune(unprintable))
r, size := utf8.DecodeRune(unprintable)
fmt.Println(r, size, string(r))
fmt.Printf("valid rune? #v", utf8.ValidRune(r))

Output:

valid? true
full rune? true
65039 3 ️
valid rune? true

So, your db is fine, unprintable "character" is fine, but NSJSONSerialization can not handle it. Better to ask iOS community =)

Upvotes: 2

Related Questions