Bojan Horvat
Bojan Horvat

Reputation: 223

Evaluating AST (abstract syntax tree) in Clojure

How to evaluate AST with better performance? Currently we create AST as tree where leaf nodes (terminals) are functions of one argument - map of keywords and their values. Terminals are represented with keywords, and functions (non-terminals) can be user (or clojure) defined functions. Full growth method creates tree from non-terminals and terminals:

(defn full-growth
  "Creates individual by full growth method: root and intermediate nodes are
   randomly selected from non-terminals Ns,
   leaves at depth depth are randomly selected from terminals Ts"
  [Ns Ts arity-fn depth]
  (if (<= depth 0)
    (rand-nth Ts)
    (let [n (rand-nth Ns)]
      (cons n (repeatedly (arity-fn n) #(full-growth Ns Ts arity-fn(dec depth)))))))

Example of generated AST:

=> (def ast (full-growth [+ *] [:x] {+ 2, * 2} 3))
#'gpr.symb-reg/ast
=> ast
(#object[clojure.core$_STAR_ 0x6fc90beb "clojure.core$_STAR_@6fc90beb"]
 (#object[clojure.core$_STAR_ 0x6fc90beb "clojure.core$_STAR_@6fc90beb"]
  (#object[clojure.core$_STAR_ 0x6fc90beb "clojure.core$_STAR_@6fc90beb"]
   :x
   :x)
  (#object[clojure.core$_PLUS_ 0x1b00ba1a "clojure.core$_PLUS_@1b00ba1a"]
   :x
   :x))
 (#object[clojure.core$_PLUS_ 0x1b00ba1a "clojure.core$_PLUS_@1b00ba1a"]
  (#object[clojure.core$_PLUS_ 0x1b00ba1a "clojure.core$_PLUS_@1b00ba1a"]
   :x
   :x)
  (#object[clojure.core$_PLUS_ 0x1b00ba1a "clojure.core$_PLUS_@1b00ba1a"]
   :x
   :x)))

, which is equivalent to

`(~* (~* (~* ~:x ~:x) (~+ ~:x ~:x)) (~+ (~+ ~:x ~:x) (~+ ~:x ~:x)))

(def ast `(~* (~* (~* ~:x ~:x) (~+ ~:x ~:x)) (~+ (~+ ~:x ~:x) (~+ ~:x ~:x))))

We can write fn which directly evaluates this AST as:

(defn ast-fn
  [{x :x}]
  (* (* (* x x) (+ x x)) (+ (+ x x) (+ x x))))

=> (ast-fn {:x 3})
648

We have two methods for creating function based on AST, one with help of apply and map, and the other with help of comp and juxt:

(defn tree-apply
  "((+ :x :x) in) => (apply + [(:x in) (:x in))]"
  ([tree] (fn [in] (tree-apply tree in)))
  ([tree in]
    (if (sequential? tree)
    (apply (first tree) (map #(tree-apply % in) (rest tree)))
    (tree in))))
#'gpr.symb-reg/tree-apply

=> (defn tree-comp
     "(+ :x :x) => (comp (partial apply +) (juxt :x :x))"
     [tree]
     (if (sequential? tree)
       (comp (partial apply (first tree)) (apply juxt (map tree-comp (rest tree))))
       tree))
#'gpr.symb-reg/tree-comp


=> ((tree-apply ast) {:x 3})
648

=> ((tree-comp ast) {:x 3})
648

With timing fn we measure time for executing functions over test cases:

=> (defn timing
     [f interval]
     (let [values (into [] (map (fn[x] {:x x})) interval)]
       (time (into [] (map f) values)))
       true)

=> (timing ast-fn (range -10 10 0.0001))
"Elapsed time: 37.184583 msecs"
true

=> (timing (tree-comp ast) (range -10 10 0.0001))
"Elapsed time: 328.961435 msecs"
true

=> (timing (tree-apply ast) (range -10 10 0.0001))
"Elapsed time: 829.483138 msecs"
true

As you can see there is huge difference in performance between direct function (ast-fn), tree-comp generated function and tree-apply generated function.

Is there some better way?

Edit: madstap's answer looks quite promising. I made some modifications on his solution (terminals could be also some other functions, not just keyword, like constant function which constantly returns value, regardless of input):

(defn c [v] (fn [_] v))
(def c1 (c 1))

(defmacro full-growth-macro
     "Creates individual by full growth method: root and intermediate nodes are
      randomly selected from non-terminals Ns,
      leaves at depth depth are randomly selected from terminals Ts"
     [Ns Ts arity-fn depth]
     (let [tree (full-growth Ns Ts arity-fn depth)
           val-map (gensym)
           ast2f (fn ast2f [ast] (if (sequential? ast)
                   (list* (first ast) (map #(ast2f %1) (rest ast)))
                   (list ast val-map)))
           new-tree (ast2f tree)]
       `{:ast '~tree
         :fn (fn [~val-map] ~new-tree)}))

Now, creating ast-m (with use of constant c1 as terminal) and associated ast-m-fn:

=> (def ast-m (full-growth-macro [+ *] [:x c1] {+ 2 * 2} 3))
#'gpr.symb-reg/ast-m
=> ast-m
{:fn
 #object[gpr.symb_reg$fn__20851 0x31802c12 "gpr.symb_reg$fn__20851@31802c12"],
 :ast
 (+
  (* (+ :x :x) (+ :x c1))
  (* (* c1 c1) (* :x c1)))}
=> (defn ast-m-fn
     [{x :x}]
     (+
     (* (+ x x) (+ x 1))
     (* (* 1 1) (* x 1))))
#'gpr.symb-reg/ast-m-fn

Timing looks very similar:

=> (timing (:fn ast-m) (range -10 10 0.0001))
"Elapsed time: 58.478611 msecs"
true
=> (timing (:fn ast-m) (range -10 10 0.0001))
"Elapsed time: 53.495922 msecs"
true
=> (timing ast-m-fn (range -10 10 0.0001))
"Elapsed time: 74.412357 msecs"
true
=> (timing ast-m-fn (range -10 10 0.0001))
"Elapsed time: 59.556227 msecs"
true

Upvotes: 5

Views: 902

Answers (2)

madstap
madstap

Reputation: 1552

Use a macro to write the equivalent of ast-fn.

(ns foo.core
  (:require
   [clojure.walk :as walk]))

(defmacro ast-macro [tree]
  (let [val-map (gensym)
        new-tree (walk/postwalk (fn [x]
                                  (if (keyword? x)
                                    (list val-map x)
                                    x))
                                (eval tree))]
    `(fn [~val-map] ~new-tree)))

On my machine this comes close to the perf of ast-fn. 45 msecs to 50 msecs. It does more lookups, but that can be fixed with some extra tinkering.

Edit: I thought some more about this. evaling the argument at macroexpansion time will limit how you can use this (the argument can't be a local). Making full-growth a macro could work better. Like amalloy says, it's all about what you want to do at runtime vs macroexpansion time.

(defmacro full-growth-macro
  "Creates individual by full growth method: root and intermediate nodes are
   randomly selected from non-terminals Ns,
   leaves at depth depth are randomly selected from terminals Ts"
  [Ns Ts arity-fn depth]
  (let [tree (full-growth Ns Ts arity-fn depth)
        val-map (gensym)
        new-tree (walk/postwalk (fn [x]
                                  (if (keyword? x)
                                    (list val-map x)
                                    x))
                                tree)]
    `{:ast '~tree
      :fn (fn [~val-map] ~new-tree)}))

Upvotes: 1

amalloy
amalloy

Reputation: 91877

You are reimplementing a sizeable chunk of what the compiler does, in a much less efficient way, using hashmaps for variable lookup by name at runtime. Normally the compiler can pre-resolve locals to a known place on the stack, and look them up with a single bytecode instruction, but you force it to call many functions in order to find out what variable to use for x. Likewise, you go through several levels of dynamic dispatch in order to find out that you want to call *, whereas typically the compiler can see a literal * in the source code and emit a simple call to clojure.lang.Numbers/multiply.

By deferring all this stuff to runtime, you impose an unavoidable penalty on yourself. I think you've done as much as you can to speed things up already.

Upvotes: 1

Related Questions