Plakhoy
Plakhoy

Reputation: 1872

How to tell a lisp reader function to ignore errors during parsing

I need a way to suppress any error messages raised when parsing code using read-from-string so that I can read from Clojure code using something like this:

(let* ((string-with-code "  '(func UpperCase \"string\")")
       (brace-pos (position #\( string-with-code))
       (end-of-token (+ brace-pos 
                        (position #\Space (subseq string-with-code brace-pos))))
       (first-token (subseq string-with-code (incf brace-pos) end-of-token)))
  (format t "Found function: '~a'" 
          (read-from-string first-token)))
  ;; ==> Found function: 'func'

It basically prints the function name from the lisp code in the string. It works Ok until you try to use the dot operator(.) as the first item in the list. Clojure uses the . both to cons and to access classes in a Java package and hence valid code like:

(defmacro chain
  ([x form] `(. ~x ~form))
  ([x form & more] `(chain (. ~x ~form) ~@more)))

would cause an error:

*** - READ from #<INPUT STRING-INPUT-STREAM>: token "." not allowed here

if I were to walk it printing every function in the code. I want a way to ignore/suppress the error messages from read-from-string for this code to work preferably without modifying the way the lisp reader works.

EDIT :

A complete program:

(defvar string-with-code "(defmacro chain
                    ([x form] `(d ~x ~form))
                    ([x form & more] `(chain (. ~x ~form) ~@more)))
    ")

(defvar end-of-token 0)
(defvar first-token 0)

(defun functions-in-string (code)
  (let ((bpos (position #\( code)))
    (unless (null bpos) 
      (setq end-of-token (+ bpos (position #\Space (subseq code bpos))))
      (setq first-token (subseq code (incf bpos) end-of-token))
      (ignore-errors
       (format t "Found function: '~(~A~)'~%" (read-from-string first-token)))
      (functions-in-string (subseq code end-of-token)))))

    ;; (ignore-errors
     ;; (functions-in-string 0 code))

(functions-in-string string-with-code)

OUTPUT :

Found function: 'defmacro'
Found function: '[x'
Found function: 'd'
Found function: '[x'
Found function: 'chain'
;; You'll get the error below if ignore-errors wraps around the function call
;; *** - READ from #<INPUT STRING-INPUT-STREAM>: token "." not allowed here

Upvotes: 0

Views: 898

Answers (2)

user797257
user797257

Reputation:

Here's a start for Clojure yacc parsers. This needs more attention from you to deal with special Clojure reader macros and possibly ensure some other grammar aspects, but this is already a functioning start:

(eval-when (:compile-toplevel :load-toplevel :execute)
  (defun print-args (&rest args) (format nil "~{~a~^ ~}" args) ))

(defun clojure-lexer (stream)
  (let ((digits '(#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9))
        (translations (make-hash-table)))
    (loop :for (key . value) :in
       '((#\( . oparen)
         (#\) . cparen)
         (#\[ . obracket)
         (#\] . cbracket)
         (#\' . squote)
         (#\` . accent)
         (#\: . colon)
         (#\, . comma)
         (#\@ . at)) :do
       (setf (gethash key translations) value))
    (labels ((%reverse-coerce (collected)
               (coerce (nreverse collected) 'string))
             (%read-token ()
               (loop
                  :with collected := nil
                  :and stringp := nil
                  :and commentp := nil
                  :for token := (read-char stream nil nil) :do
                  (cond
                    ((null token)
                     (return (and collected (%reverse-coerce collected))))
                    ((char= token #\;)
                     (push token collected)
                     (setf commentp t))
                    ((char= token #\")
                     (if commentp
                         (push token collected)
                         (if stringp
                             (progn
                               (push token collected)
                               (return (%reverse-coerce collected)))
                             (if collected
                                 (progn
                                   (unread-char token)
                                   (return (%reverse-coerce collected)))
                                 (progn
                                   (push token collected)
                                   (setf stringp t))))))
                    ((gethash token translations)
                     (if (or stringp commentp)
                         (push token collected)
                         (if collected
                             (progn
                               (unread-char token stream)
                               (return (%reverse-coerce collected)))
                             (return (gethash token translations)))))
                    ((member token '(#\Newline #\Rubout))
                     (if commentp
                         (return (and collected (%reverse-coerce collected)))
                         (if stringp
                             (push token collected)
                             (and collected (return (%reverse-coerce collected))))))
                    ((member token '(#\Space #\Tab))
                     (if (or stringp commentp)
                         (push token collected)
                         (and collected (return (%reverse-coerce collected)))))
                    (t (push token collected))))))
      (lambda ()
        (let* ((key (%read-token))
               (value (or (gethash key translations) key)))
          (if (null key)
              (values nil nil)
              (let ((terminal
                     (cond
                       ((member key '(oparen cparen squote colon accent
                                      comma at obracket cbracket))
                        key)
                       ((or (member (char key 0) digits)
                            (and (char= (char key 0) #\-)
                                 (> (length key) 1)
                                 (member (char key 1) digits)))
                        'number)
                       ((char= (char key 0) #\") 'string)
                       ((char= (char key 0) #\;) 'comment)
                       (t 'id))))
                (values terminal value))))))))

(yacc:define-parser *clojure-parser*
  (:start-symbol exp)
  (:terminals (id oparen cparen squote colon accent
                  comma at obracket cbracket string number))

  (exp
   (oparen id exp-list cparen #'print-args)
   (oparen id cparen #'print-args)
   (obracket exp-list cbracket #'print-args)
   (obracket cbracket #'print-args)
   (comment #'print-args)
   (accented-exp #'print-args)
   (quoted-exp #'print-args)
   (term #'print-args))

  (term id string number)
  (quoted-exp (quote exp))
  (accented-exp (accent exp) (accent at exp))
  (exp-list (exp exp-list) exp))

(defun parse-clojure (string)
  (yacc:parse-with-lexer
   (clojure-lexer (make-string-input-stream string)) *clojure-parser*))

(parse-clojure
 "(defn str-invoke [instance method-str & args]
            (clojure.lang.Reflector/invokeInstanceMethod 
                \"instance\" 123 
                method-str 
                (to-array args)))")

Results in:

;; "OPAREN defn (str-invoke
;;              (OBRACKET (instance (method-str (& args))) CBRACKET
;;               OPAREN clojure.lang.Reflector/invokeInstanceMethod (\"instance\"
;;                                                     (123
;;                                                      (method-str
;;                                                       OPAREN to-array args CPAREN))) CPAREN)) CPAREN"

Here's the BNF for the above grammar (not claiming it is the Clojure grammar, it only reflects the Lisp code above):

exp ::= '(' id exp-list ')'
      | '(' id ')'
      | '[' exp-list ']'
      | '[' ']'
      | ';' /[^\n]*/
      | accented-exp
      | quoted-exp
      | term

term ::= id | '"' /[^"]*/ '"' | /-?[0-9][^\s]+/
quoted-exp ::= '\'' exp
accented-exp ::= '`' exp | '`' '@' exp
exp-list ::= exp exp-list | exp
id ::= /[^()[\]:,`@']+/

For simplicity, some parts are given as regular expressions, those are delimited by //.

Upvotes: 2

Rainer Joswig
Rainer Joswig

Reputation: 139321

Not clear what you are asking, but ignoring errors simply is:

CL-USER 37 > (ignore-errors (read-from-string "(. foo bar)"))
NIL
#<CONDITIONS:SIMPLE-READER-ERROR 402000243B>

In case of an error, IGNORE-ERRORS returns NIL and as the second return value the condition.

If you want more control, you would need to write an error handler.

Upvotes: 3

Related Questions