Hecatomb
Hecatomb

Reputation: 21

Listing directory names with Unicode symbols in them isn't working correctly

I'm trying to write to a file a list of all the sub-directories, but the unicode symbols in the sub-directory names get replaced by question marks. I'm using CLISP 2.49 on Windows XP.

Here is the short version of the code:

(let ((*pathname-encoding* (ext:make-encoding :charset 'charset:utf-8
                                              :line-terminator :dos)))
    (with-open-file (stream "folders.txt"
                     :direction :output
                     :if-exists :overwrite
                     :if-does-not-exist :create
                     :external-format (ext:make-encoding :charset 'charset:utf-8
                                                         :line-terminator :dos))
       (format stream "~A~&" (directory ".\\*\\"))))

Upvotes: 2

Views: 834

Answers (1)

sds
sds

Reputation: 60014

What you are doing wrong

You should be aware that *pathname-encoding* is a SYMBOL-MACRO, not a variable. As the note in the CLISP manual says,

Reminder: You have to use EXT:LETF/EXT:LETF* for SYMBOL-MACROs; LET/LET* will not work!

So, what you need to do is

(ext:letf ((*pathname-encoding* charset:utf-8)) ...)

(the line-terminator mode of *pathname-encoding* is ignored anyway).

Example

$ touch 'идиотский файл'
$ ls
идиотский файл
$ LANG=C ls
?????????????????? ????????
$ LANG=C clisp -q -norc 
> *pathname-encoding* 
#<ENCODING CHARSET:ASCII :UNIX>
> *default-file-encoding* 
#<ENCODING CHARSET:ASCII :UNIX>
> *terminal-encoding* 
#<ENCODING CHARSET:ASCII :UNIX>
> (letf ((*pathname-encoding* charset:utf-8))
    (with-open-file (o "foo" :direction :output :external-format charset:utf-8) 
      (format o "~A~%" (directory "*"))))
NIL
> (quit)
$ cat foo
(/home/sds/tmp/z/идиотский файл /home/sds/tmp/z/foo)

Debugging your specific problem

Under no circumstances will CLISP print or return ? instead of a character it cannot handle - it will signal an error (try to omit the one of the encoding specs and you will get an error Invalid byte #xD0 in CHARSET:ASCII conversion - either from write or from directory).

Therefore the problem is at the boundary:

  • either the OS gives CLISP question marks instead of unicode (because it thinks that CLISP cannot handle i18n)
  • or the files produced by CLISP are incorrectly saved by the low level OS layer
  • or the tools you are using to view the files cannot display the unicode characters

(only the last option appears plausible).

What you could do is:

  1. start with removing the encoding specs - do you get the conversion errors? examine the default encoding places (that's the fancy Lisp word for symbol macros like *pathname-encoding* &c)
  2. make sure *pathname-encoding* is utf-8 and try something like (coerce (pathname-name (car (directory "*"))) 'list) - in my example above I see (#\CYRILLIC_SMALL_LETTER_I ...); do you see unicode chars like I do, or do you see #\??
  3. try cygwin (ls, ls | od, ls > foo; cat foo | od) to see whether it can capture the non-ascii characters.

Upvotes: 2

Related Questions