Reputation: 21
I'm trying to write to a file a list of all the sub-directories, but the unicode symbols in the sub-directory names get replaced by question marks. I'm using CLISP 2.49 on Windows XP.
Here is the short version of the code:
(let ((*pathname-encoding* (ext:make-encoding :charset 'charset:utf-8
:line-terminator :dos)))
(with-open-file (stream "folders.txt"
:direction :output
:if-exists :overwrite
:if-does-not-exist :create
:external-format (ext:make-encoding :charset 'charset:utf-8
:line-terminator :dos))
(format stream "~A~&" (directory ".\\*\\"))))
Upvotes: 2
Views: 834
Reputation: 60014
You should be aware that *pathname-encoding*
is a SYMBOL-MACRO, not a variable.
As the note in the CLISP manual says,
Reminder: You have to use EXT:LETF/EXT:LETF* for SYMBOL-MACROs; LET/LET* will not work!
So, what you need to do is
(ext:letf ((*pathname-encoding* charset:utf-8)) ...)
(the line-terminator
mode of *pathname-encoding*
is ignored anyway).
$ touch 'идиотский файл'
$ ls
идиотский файл
$ LANG=C ls
?????????????????? ????????
$ LANG=C clisp -q -norc
> *pathname-encoding*
#<ENCODING CHARSET:ASCII :UNIX>
> *default-file-encoding*
#<ENCODING CHARSET:ASCII :UNIX>
> *terminal-encoding*
#<ENCODING CHARSET:ASCII :UNIX>
> (letf ((*pathname-encoding* charset:utf-8))
(with-open-file (o "foo" :direction :output :external-format charset:utf-8)
(format o "~A~%" (directory "*"))))
NIL
> (quit)
$ cat foo
(/home/sds/tmp/z/идиотский файл /home/sds/tmp/z/foo)
Under no circumstances will CLISP print or return ?
instead of a character it cannot handle - it will signal an error (try to omit the one of the encoding specs and you will get an error Invalid byte #xD0 in CHARSET:ASCII conversion
- either from write
or from directory
).
Therefore the problem is at the boundary:
(only the last option appears plausible).
What you could do is:
*pathname-encoding*
&c)*pathname-encoding*
is utf-8
and try something like (coerce (pathname-name (car (directory "*"))) 'list)
- in my example above I see (#\CYRILLIC_SMALL_LETTER_I ...)
; do you see unicode chars like I do, or do you see #\?
?cygwin
(ls
, ls | od
, ls > foo; cat foo | od
) to see whether it can capture the non-ascii characters.Upvotes: 2