An analogue to dotted-pairs for pattern matching in Clojure

Question

Scheme (and CL) has dotted-pairs where both elements of a cons cell are specified explicitly (e.g. (1 . 2)) rather than implicitly (e.g. (1 2) which is read as (1 . (2 . nil))).

I came across this puzzle where dotted-pairs are used in pattern matching to capture the tail of a list in the object being matched, e.g.:

(pmatch '(foo . (? pvar)) '(foo bar baz))
;;      => ((pvar bar baz))

Here '(foo . (? pvar)) is a pattern and '(foo bar baz) is the object matched against the pattern. foo in the pattern is a literal, whereas (? pvar) is a pattern variable which matches (bar baz) and binds the symbol pvar to that match. The pmatch function returns an association list of pattern variables and bound matches.

Had the pattern been '(foo (? pvar)), the match would fail, because baz would not match anything in the pattern.

I've implemented the puzzle in Clojure, and I pass all of JRM's test cases apart from the dotted-pair one. I'm trying to figure out how to possibly support the dotted-pair pattern as well.

Here's my current solution:

(defn pattern-variable? [pv]
  (when (seq? pv)
    (let [[qmark var] pv]
     (and (= (count pv) 2)
          (= qmark '?)
          (or (symbol? var)
              (keyword? var)))))

(defn pattern-variable [pv]
  (second pv))

(defn pmatch
  ([pat obj] (pmatch pat obj {}))
  ([pat obj binds]
     (cond (not (coll? pat))
           (when (= pat obj) binds)
           (pattern-variable? pat)
           (assoc binds (pattern-variable pat) obj)
           (seq? pat) (let [[pat-f & pat-r] pat]
                      (when (seq? obj)
                        (when-let [binds (pmatch pat-f (first obj) binds)]
                          (pmatch pat-r (next obj) binds))))
           :else nil)))

So how can I support patterns that match the rest of the object in Clojure without dotted-pairs?

Michał Marczyk · Accepted Answer

(Edit: Added a slightly longer, but significantly clearer matcher impl + a demo. The original remains below the horizontal rule.)

One solution would be to introduce a different notation to signify a variable to be matched against the tail of a seq, or "variable after dot". Another one would be to reserve & as a special symbol in patterns with the requirement that it may only be followed by a single pattern variable to be matched against the rest of the expression / object, which must be a seq. I'll explore the first approach below.

Here I took the liberty of changing the notation so that ~foo is a regular occurrence of the variable foo and ~@foo is a tail occurrence. (One could permit ~@-matching against subsequences, perhaps matching the minimal initial fragment of a sequence, if any, such that the remainder can be matched against the rest of the pattern; I'll just say this is out of scope for this answer, though. ;-))

Note that these really are different occurrences of the same variable -- i.e. there is still only one variable type -- since no distinction is being made between bindings arising from ~-occurrences and bindings arising from ~@-occurrences.

Also note that the examples in the post you linked to do not test for attempts to rebind a previously bound variable (e.g. try (pmatch '(~x ~x) '(foo bar)), (pmatch '((? x) (? x)) '(foo bar)) in the original syntax). The code below returns nil in such cases, as it does when the match fails for other reasons.

First, a demo:

user> (pmatch '(foo ~pvar1 ~pvar2 bar) '(foo 33 (xyzzy false) bar))
{pvar2 (xyzzy false), pvar1 33}
user> (pmatch '(~av ~@sv) '(foo bar baz))
{sv (bar baz), av foo}
user> (pmatch '(foo ~pvar1 ~pvar2 bar) '(foo 33 false bar))
{pvar2 false, pvar1 33}
user> (pmatch '(foo ~pvar bar) '(quux 33 bar))
nil
user> (pmatch '(a ~var1 (nested (c ~var2))) '(a b (nested (c d))))
{var2 d, var1 b}
user> (pmatch '(a b c) '(a b c))
{}
user> (pmatch '(foo ~pvar1 ~pvar2 bar) '(foo 33 (xyzzy false) bar))
{pvar2 (xyzzy false), pvar1 33}
user> (pmatch '(foo ~@pvar) '(foo bar baz))
{pvar (bar baz)}
user> (pmatch '(~? quux) '(foo quux))
{? foo}
user> (pmatch '~? '(foo quux))
{? (foo quux)}
user> (pmatch '(? ? ?) '(foo quux))
nil

Here's the matcher:

(defn var-type [pat]
  (when (seq? pat)
    (condp = (first pat)
      'clojure.core/unquote :atomic
      'clojure.core/unquote-splicing :sequential
      nil)))

(defn var-name [v]
  (when (var-type v)
    (second v)))

(defmulti pmatch*
  (fn [pat expr bs]
    (cond
      (= :atomic (var-type pat))        :atom
      (= :sequential (var-type pat))    nil
      (and (seq? pat) (seq? expr))      :walk
      (not (or (seq? pat) (seq? expr))) :exact
      :else                             nil)))

(defmethod pmatch* :exact [pat expr bs]
  (when (= pat expr) bs))

(defmethod pmatch* :atom [v expr bs]
  (if-let [[_ x] (find bs (var-name v))]
    (when (= x expr) bs)
    (assoc bs (var-name v) expr)))

(defmethod pmatch* :walk [pat expr bs]
  (if-let [[p] pat]
    (if (= :sequential (var-type p))
      (when (and (seq? expr) (not (next pat)))
        (if-let [[_ xs] (find bs (var-name p))]
          (when (= xs expr) bs)
          (assoc bs (var-name p) expr)))
      (when-let [[x] expr]
        (when-let [m (pmatch* p x bs)]
          (pmatch* (next pat) (next expr) m))))))

(defmethod pmatch* nil [& _] nil)

(defn pmatch
  ([pat expr] (pmatch pat expr {}))
  ([pat expr bs] (pmatch* pat expr bs)))

And here's the original monolithic version:

(defn pmatch
  ([pat expr] (pmatch pat expr {}))
  ([pat expr bs]
     (letfn [(atom-var? [pat]
               (and (seq? pat) (= 'clojure.core/unquote (first pat))))
             (seq-var? [pat]
               (and (seq? pat) (= 'clojure.core/unquote-splicing
                                  (first pat))))
             (v [var] (second var))
             (matcha [a e bs]
               (if-let [[_ x] (find bs (v a))]
                 (and (or (= x e) nil) bs)
                 (assoc bs (v a) e)))
             (matchs [s e bs]
               (when (seq? e)
                 (if-let [[_ xs] (find bs (v s))]
                   (or (= xs e) nil)
                   (assoc bs (v s) e))))]
       (when bs
         (cond
           (atom-var? pat)
           (matcha pat expr bs)

           (seq-var? pat)
           (matchs pat expr bs)

           (and (seq? pat) (seq? expr))
           (if-let [[p] pat]
             (if (seq-var? p)
               (matchs p expr bs)
               (when-let [[x] expr]
                 (when-let [m (pmatch p x bs)]
                   (recur (next pat) (next expr) m))))
             (when-not (first expr)
               bs))

           (not (or (seq? pat) (seq? expr)))
           (when (= pat expr)
             bs)

           :else nil)))))

An analogue to dotted-pairs for pattern matching in Clojure

Answers (1)

Related Questions