Random AST generation with the specified size in Clojure

Question

I would like to generate a random abstract syntax tree

(def terminal-set #{'x 'R})
(def function-arity {'+ 2, '- 2, '* 2, '% 2})
(def function-set (into #{} (keys function-arity)))
(def terminal-vec (into [] terminal-set))
(def function-vec (into [] function-set))

;; protected division
(defn % [^Number x ^Number y]
  (if (zero? y)
    0
    (/ x y)))

with the specified size

(defn treesize [tree] (count (flatten tree)))

following the algorithm from the book Sean Luke, 2013, Essentials of Metaheuristics, Lulu, second edition, available at https://cs.gmu.edu/~sean/book/metaheuristics/

We randomly extend the horizon of a tree with nonleaf nodes until the number of nonleaf nodes, plus the remaining spots, is greater than or equal to the desired size. We then populate the remaining slots with leaf nodes:

For example

(+ (* x (+ x x)) x)

is of size 7.

The algorithm in the book uses pointers/references Q which is very convenient there. In my case I have to use some kind of recursion to construct the tree. The problem is that I can't keep the state size of the tree between all algorithms using recursion which results in larger trees:

(defn ptc2-tree
  "Generate a random tree up to its `max-size`.
  Note: `max-size` is the number of nodes, not the same as its depth."
  [max-size]
  (if (> 2 max-size)
    (rand-nth terminal-vec)
    (let [fun   (rand-nth function-vec)
          arity (function-arity fun)]
      (cons fun (repeatedly arity #(ptc2-tree (- max-size arity 1)))))))

I also tried using atom for size but still couldn't get the exact tree size I want, it was either too small or too big depending on implementation.

Beside this I also have to somehow randomize the location where I insert the new node/tree.

How do I write this algorithm?

EDIT: A final touch to the correct solution:

(defn sequentiate [v] 
  (map #(if (seqable? %) (sequentiate %) %) (seq v)))

Aleph Aleph · Accepted Answer

The below is more or less a word-for-word translation of the PTC2 algorithm in the article. It's not quite idiomatic Clojure code; you may want to split it into functions / smaller blocks as you see reasonable.

(defn ptc2 [target-size]
  (if (= 1 target-size)
    (rand-nth terminal-vec)
    (let [f (rand-nth function-vec)
          arity (function-arity f)]
      ;; Generate a tree like [`+ nil nil] and iterate upon it
      (loop [ast (into [f] (repeat arity nil))
             ;; q will be something like ([1] [2]), being a list of paths to the
             ;; nil elements in the AST
             q (for [i (range arity)] [(inc i)])
             c 1]
        (if (< (+ c (count q)) target-size)
          ;; Replace one of the nils in the tree with a new node
          (let [a (rand-nth q)
                f (rand-nth function-vec)
                arity (function-arity f)]
            (recur (assoc-in ast a (into [f] (repeat arity nil)))
                   (into (remove #{a} q)
                         (for [i (range arity)] (conj a (inc i))))
                   (inc c)))
          ;; In the end, fill all remaining slots with terminals
          (reduce (fn [t path] (assoc-in t path (rand-nth terminal-vec)))
                  ast q))))))

You can use Clojure's loop construct (or reduce to keep the state of your iteration - in this algorith, the state includes):

ast, which is a nested vector that represents the formula which is being built, where the not-yet-completed nodes are marked as nil;
q, which corresponds to Q in the pseudocode and is a list of the paths to unfinished nodes in the ast,
and c, which is the count of the non-leaf nodes in the tree.

In the result, you get something like:

(ptc2 10) ;; => [* [- R [% R [% x x]]] [- x R]]

We make the AST using vectors (as opposed to lists) as it allows us to use assoc-in to progressively build the tree; you may want to convert it to nested lists by yourself if you need.

Random AST generation with the specified size in Clojure

Answers (2)

Related Questions