Reputation: 43
Using the language server protocol (lsp) between VS Code and a self implemented language server executable (win, exe, per socket), I am recognising the shipped data format and wondering about the range calculation.
Explaining on an example: In VS Code a document is opened with the text
package aäbc
The first info about the text is:
{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"file:///d%3A/dev/test%20files/test_vscode1049-2.fidl","languageId":"francaidl","version":1,"text":"package aäbc"}}}
Please see the value of text. Then the user selects b
after the special character ä
(German umlaut) and replaces it at once by xyzÖ
(again a special character included).
Looking at the received message, I see
{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"file:///d%3A/dev/test%20files/test_vscode1049-2.fidl","version":2},"contentChanges":[{"range":{"start":{"line":0,"character":10},"end":{"line":0,"character":11}},"rangeLength":1,"text":"xyzÖ"}]}}
I like to set your focus on two things:
ä
is replaced by ä
(first message didOpen) and Ö
is replaced by Ö
(second message didChange).My conclusion is, that the exchanged text uses UTF-8, but the position counting has the code point (visible letter) in mind - which is quite meaningful, I think. On the other hand it seems that I have to adjust the given range when working with the shipped 8-bit units of UTF-8 - which is quite uncomfortable, I think.
But: In the initialisation process of the lsp we find a property for position encoding. You can presume that my response gave the default value utf-16
back (actually the only item in the request, which is being sent by VS Code and which "must always be supported" according to the specification). So what I am wondering about is (the concrete questions): Should not be the calculation of the range respect positionEncoding
anyway? And if not: What is the purpose of positionEncoding
?
By the way: Nothing changes if no position encoding is set in the response, because of "If omitted it defaults to 'utf-16'" (again according to the specification).
And I finally want to append, that I checked the transmitted data, which is sent via socket: Wireshark also shows me UTF-8 encoding. Does anybody know if this can and must be adjusted to something else?
Upvotes: 1
Views: 69