Reputation: 3747
I am playing with SML/NJ (version 110.99.4) on Windows 10.
I have a structure containing a text file in UTF-8 encoding:
...
let
val s:string = "søk"
in
print s
end;
...
My console has 65001 code page (which is UTF-8) - chcp
reports it.
This code prints søk
. Then, I have 3 questions:
widestring
(and widechar
) type for Unicode, but it's optional for Windows (actually it is missing), I supposed that string
is ASCII string, but it seems it is not. So, what is string
type? Codepoints? UTF-8?string
from SML/NJ? Can I use it everywhere (on Linux, for example) where I want UTF-8?string
similar for all SML implementations?PS. Also my SML/NJ version has UTF8 structure (open UTF8
passes). It recalls wchar
. But I see that string
allows to print non-ASCII strings correctly. At the same time the structure String
recalls char
. It confuse me more even: what does string
contains: wchar
or char
(but UTF8)? Then what is the missing widechar
?
PPS. Attempt to enter non-ASCII string in sml.bat repl's session failed with:
stdIn:2.10 Error: illegal non-printing character in string
stdIn:2.11 Error: illegal non-printing character in string
stdIn:2.12 Error: illegal non-printing character in string
...
Sorry, for so many questions, I would appreciate any clarification about the state of Unicode, UTF-8 in the world of Standard ML (and SML/NJ) and convenient ways to work with them.
Upvotes: 1
Views: 122
Reputation: 3747
I found for instance, such library:
https://github.com/cannam/sml-utf8 which defines WdString
. It allows to encode/decode to/from UTF8/wide-string and other "standard" (for SML) string operations. I tried it with SML/NJ and it seems it works.
Upvotes: 1