Bill Evans at Mariposa
Bill Evans at Mariposa

Reputation: 3848

sbcl: list all valid character encodings

To get the list of all valid encodings for sbcl, I do this:

(let (encoding-list)
  (let (symbol-list)
    (do-external-symbols (s :keyword)
      (push s symbol-list))
    (setf symbol-list (sort symbol-list #'string<))
    (mapc (lambda (x)
            (when (ignore-errors
                    (with-open-file
                      (phyle "scratch1"
                        :direction       :output
                        :if-exists       :supersede
                        :external-format x)
                      1))  ; <-- produce something non-NIL
              (push x encoding-list)))
          symbol-list))
  (nreverse encoding-list))

Is there an easier way to do this in sbcl? (For example, in clisp, all the encodings are external symbols in the CHARSET package.)

Upvotes: 1

Views: 389

Answers (1)

jkiiski
jkiiski

Reputation: 8411

The only "official" list is in the manual. If you don't mind looking into SBCLs internals, external formats are stored in a hash table, SB-IMPL::*EXTERNAL-FORMATS*.

CL-USER> (alexandria:hash-table-keys sb-impl::*external-formats*)
(:UTF32BE :UTF-32BE :UTF32LE :UTF-32LE :UTF16BE :UTF-16BE :UTF16LE :UTF-16LE
 :UCS4BE :UCS-4BE :UCS4LE :UCS-4LE :UCS2BE :UCS-2BE :UCS2LE :UCS-2LE :CP932
 :|Shift_JIS| :SJIS :SHIFT_JIS :|eucJP| :EUCJP :EUC-JP :CP936 :GBK :|macintosh|
 :MACINTOSH :|mac| :MAC :|MacRoman| :|mac-roman| :MAC-ROMAN :|windows-1258|
 :WINDOWS-1258 :|cp1258| :CP1258 :|windows-1257| :WINDOWS-1257 :|cp1257|
 :CP1257 :|windows-1256| :WINDOWS-1256 :|cp1256| :CP1256 :|windows-1255|
 :WINDOWS-1255 :|cp1255| :CP1255 :|cp1254| :CP1254 :|windows-1253|
 :WINDOWS-1253 :|cp1253| :CP1253 :|windows-1252| :WINDOWS-1252 :|cp1252|
 :CP1252 :|windows-1251| :WINDOWS-1251 :|cp1251| :CP1251 :|windows-1250|
 :WINDOWS-1250 :|cp1250| :CP1250 :ISO8859-15 :ISO-8859-15 :LATIN9 :LATIN-9
 :|latin-8| :LATIN-8 :|iso-8859-14| :ISO-8859-14 :|latin-7| :LATIN-7
 :|iso-8859-13| :ISO-8859-13 :|iso-8859-11| :ISO-8859-11 :|latin-6| :LATIN-6
 :|iso-8859-10| :ISO-8859-10 :|latin-5| :LATIN-5 :|iso-8859-9| :ISO-8859-9
 :|iso-8859-8| :ISO-8859-8 :|iso-8859-7| :ISO-8859-7 :|iso-8859-6| :ISO-8859-6
 :|iso-8859-5| :ISO-8859-5 :|latin-4| :LATIN-4 :|iso-8859-4| :ISO-8859-4
 :|latin-3| :LATIN-3 :|iso-8859-3| :ISO-8859-3 :|latin-2| :LATIN-2
 :|iso-8859-2| :ISO-8859-2 :|cp874| :CP874 :|cp869| :CP869 :|cp866| :CP866
 :|cp865| :CP865 :|cp864| :CP864 :|cp863| :CP863 :|cp862| :CP862 :|cp861|
 :CP861 :|cp860| :CP860 :|cp857| :CP857 :|cp855| :CP855 :|cp852| :CP852
 :|cp850| :CP850 :|cp437| :CP437 :|x-mac-cyrillic| :X-MAC-CYRILLIC :|koi8-u|
 :KOI8-U :|koi8-r| :KOI8-R :IBM037 :IBM-037 :|cp037| :CP037 :EBCDIC-US :UTF8
 :UTF-8 :ISO8859-1 :ISO-8859-1 :LATIN1 :LATIN-1 :|646| :ISO-646-US :ISO-646
 :ANSI_X3.4-1968 :US-ASCII :ASCII)

Of course, since this is not a public API there is no guarantee that it won't be broken in future releases.

Upvotes: 3

Related Questions