Matt
Matt

Reputation: 43

position encoding in lsp

Using the language server protocol (lsp) between VS Code and a self implemented language server executable (win, exe, per socket), I am recognising the shipped data format and wondering about the range calculation.

Explaining on an example: In VS Code a document is opened with the text

package aäbc

The first info about the text is:

{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"file:///d%3A/dev/test%20files/test_vscode1049-2.fidl","languageId":"francaidl","version":1,"text":"package aäbc"}}}

Please see the value of text. Then the user selects b after the special character ä (German umlaut) and replaces it at once by xyzÖ (again a special character included).

Looking at the received message, I see

{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"file:///d%3A/dev/test%20files/test_vscode1049-2.fidl","version":2},"contentChanges":[{"range":{"start":{"line":0,"character":10},"end":{"line":0,"character":11}},"rangeLength":1,"text":"xyzÖ"}]}}

I like to set your focus on two things:

My conclusion is, that the exchanged text uses UTF-8, but the position counting has the code point (visible letter) in mind - which is quite meaningful, I think. On the other hand it seems that I have to adjust the given range when working with the shipped 8-bit units of UTF-8 - which is quite uncomfortable, I think.

But: In the initialisation process of the lsp we find a property for position encoding. You can presume that my response gave the default value utf-16 back (actually the only item in the request, which is being sent by VS Code and which "must always be supported" according to the specification). So what I am wondering about is (the concrete questions): Should not be the calculation of the range respect positionEncoding anyway? And if not: What is the purpose of positionEncoding?

By the way: Nothing changes if no position encoding is set in the response, because of "If omitted it defaults to 'utf-16'" (again according to the specification).

And I finally want to append, that I checked the transmitted data, which is sent via socket: Wireshark also shows me UTF-8 encoding. Does anybody know if this can and must be adjusted to something else?

Upvotes: 1

Views: 69

Answers (0)

Related Questions