position encoding in lsp

Question

Using the language server protocol (lsp) between VS Code and a self implemented language server executable (win, exe, per socket), I am recognising the shipped data format and wondering about the range calculation.

Explaining on an example: In VS Code a document is opened with the text

package aäbc

The first info about the text is:

{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"file:///d%3A/dev/test%20files/test_vscode1049-2.fidl","languageId":"francaidl","version":1,"text":"package aÃ¤bc"}}}

Please see the value of text. Then the user selects b after the special character ä (German umlaut) and replaces it at once by xyzÖ (again a special character included).

Looking at the received message, I see

{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"file:///d%3A/dev/test%20files/test_vscode1049-2.fidl","version":2},"contentChanges":[{"range":{"start":{"line":0,"character":10},"end":{"line":0,"character":11}},"rangeLength":1,"text":"xyzÃ–"}]}}

I like to set your focus on two things:

The sent text contains some encodings, ä is replaced by Ã¤ (first message didOpen) and Ö is replaced by Ã– (second message didChange).
The start position is set to character=10 (see range/start/character in didChange). So the real letters are counted.

My conclusion is, that the exchanged text uses UTF-8, but the position counting has the code point (visible letter) in mind - which is quite meaningful, I think. On the other hand it seems that I have to adjust the given range when working with the shipped 8-bit units of UTF-8 - which is quite uncomfortable, I think.

But: In the initialisation process of the lsp we find a property for position encoding. You can presume that my response gave the default value utf-16 back (actually the only item in the request, which is being sent by VS Code and which "must always be supported" according to the specification). So what I am wondering about is (the concrete questions): Should not be the calculation of the range respect positionEncoding anyway? And if not: What is the purpose of positionEncoding?

By the way: Nothing changes if no position encoding is set in the response, because of "If omitted it defaults to 'utf-16'" (again according to the specification).

And I finally want to append, that I checked the transmitted data, which is sent via socket: Wireshark also shows me UTF-8 encoding. Does anybody know if this can and must be adjusted to something else?

position encoding in lsp

Answers (0)

Related Questions