Gary
Gary

Reputation: 2859

Encoding of web socket channel and/or [encoding convertto...] messages?

I have data in SQLite that is text including a curly apostrophe (U-2019 &#8217) but the character is typed as in work’s. A browser makes the request for the data over a web socket connection which is configured as chan configure $sock -encoding iso8859-1 -translation crlf -buffering full simply because that is what I set it to before sending the headers to the browser in response to the request to upgrade the connection to a web socket and failed to consider it again thereafter. I was mistaken it is endoded as chan configure $sock -buffering full -blocking 0 -translation binary.

If I use replace( ..., '’', '&#8217') in the SQL then the browser displays the text with the curly apostrophe rendered normally. And if I follow the example in this SO question and encode the result of the SQL query as set encoded [encoding convertto utf-8 $SQLResult] it renders correctly also.

What is the correct way to handle this? Should the SQL results be endoded to utf-8 or should the web socket be configured differently? If the socket is encoded as utf-8 then it fails on the first message sent from the browser/client.

Thank you.

Upvotes: 1

Views: 520

Answers (1)

Donal Fellows
Donal Fellows

Reputation: 137567

The data in SQLite should be Unicode (probably UTF-8 but that shouldn't matter; the package binding handles that). That means that the data arrives in Tcl correctly encoded. Tcl's internal encoding schemes are complicated and should be ignored for the purposes of this discussion: pretend that's all Unicode too.

With a websocket, you need to say what sort of messages ("frames") you are sending. For normal payloads, they can be either binary frames or text frames. Binary frames are a bunch of bytes. Text frames are UTF-8 (as specified by the standard). STOMP frames are text frames where the content is JSON (with a few basic rules and some higher-order things over what operations can be stated).

If you are sending text, you should use UTF-8. Ideally by configuring the socket to do that in the part of the websocket client that mediates between the point that receives the text to send (maybe JSON) and the part that writes the frame onto the underlying socket itself. Above that point, you should just work with ordinary Tcl strings. If you're forming JSON messages in Tcl, the rl_json package is recommended, or you can get SQLite to produce JSON directly. (JSON's serialized form is always UTF-8, and it is conceptually Unicode.)

If you are sending binary frames, you handle all the details yourself. Using encoding convertto utf-8 may help with forming the message frame body, or you can use any other encoding scheme you prefer (such as the one you mention in your question).

Upvotes: 2

Related Questions