common-lispevallambdaprocessing-efficiency

Reputation: 169

Possible to do this without using eval in Common Lisp?

In my little project I have two arrays, lets call them A and B. Their values are #(1 2 3) and #(5 6 7). I also have two lists of symbols of identical length, lets call them C and D. They look like this: (num1 num2 num3) and (num2 num3 num4).

You could say that the symbols in lists C and D are textual labels for the values in the arrays A and B. So num1 in A is 1. num2 in A is 2. num2 in B is 5. There is no num1 in B, but there is a num3, which is 6.

My goal is to produce a function taking two arguments like so:

(defun row-join-function-factory (C D)
...body...)

I want it to return a function of two arguments:

(lambda (A B) ...body...)

such that this resulting function called with arguments A and B results in a kind of "join" that returns the new array: #(1 5 6 7)

The process taking place in this later function obtained values from the two arrays A and B such that it produces a new array whose members may be represented by (union C D). Note: I haven't actually run (union C D), as I don't actually care about the order of the symbols contained therein, but lets assume it returns (num1 num2 num3 num4). The important thing is that (num1 num2 num3 num4) corresponds as textual labels to the new array #(1 5 6 7). If num2, or any symbol, exists in both C and D, and subsequently represents values from A and B, then the value from B corresponding to that symbol is kept in the resulting array rather than the value from A.

I hope that gets the gist of the mechanical action here. Theoretically, I want row-join-function-factory to be able to do this with arrays and symbol-lists of any length/contents, but writing such a function is not beyond me, and not the question.

The thing is, I wish the returned function to be insanely efficient, which means that I'm not willing to have the function chase pointers down lists, or look up hash tables at run time. In this example, the function I require to be returned would be almost literally:

      (lambda (A B) 
(make-array 4 
    :initial-contents (list (aref A 0) (aref B 0) (aref B 1) (aref B 2))))

I do not want the array indexes calculated at run-time, or which array they are referencing. I want a compiled function that does this and this only, as fast as possible, which does as little work as possible. I do not care about the run-time work required to make such a function, only the run-time work required in applying it.

I have settled upon the use of (eval ) in row-join-function-factory to work on symbols representing the lisp code above to produce this function. I was wondering, however, if there is not some simpler method to pull off this trick that I am not thinking of, given one's general cautiousness about the use of eval...

By my reasoning, i cannot use macros by themselves, as they cannot know what all values and dimensions A, B, C, D could take at compile time, and while I can code up a function that returns a lambda which mechanically does what I want, I believe my versions will always be doing some kind of extra run-time work/close over variables/etc...compared to the hypothetical lambda function above

Thoughts, answers, recommendations and the like are welcome. Am I correct in my conclusion that this is one of those rare legitimate eval uses? Apologies ahead of time for my inability to express the problem as eloquently in english...

(or alternatively, if someone can explain where my reasoning is off, or how to dynamically produce the most efficient functions...)

Upvotes: 1

Answers (4)

6502

Reputation: 114579

Given C and D you could create a closure like

(lambda (A B)
   (do ((result (make-array n))
        (i 0 (1+ i)))
       ((>= i n) result)
     (setf (aref result i)
           (aref (if (aref use-A i) A B)
                 (aref use-index i)))))

where n, use-A and use-index are precomputed values captured in the closure like

n         --> 4
use-A     --> #(T nil nil nil)
use-index --> #(0 0 1 2)

Checking with SBCL (speed 3) (safety 0) the execution time was basically identical to the make-array + initial-contents version, at least for this simple case.

Of course creating a closure with those precomputed data tables doesn't even require a macro.

Have you actually timed how much are you going to save (if anything) using an unrolled compiled version?

EDIT

Making an experiment with SBCL the closure generated by

(defun merger (clist1 clist2)
  (let ((use1 (list))
        (index (list))
        (i1 0)
        (i2 0))
    (dolist (s1 clist1)
      (if (find s1 clist2)
          (progn
            (push NIL use1)
            (push (position s1 clist2) index))
          (progn
            (push T use1)
            (push i1 index)))
      (incf i1))
    (dolist (s2 clist2)
      (unless (find s2 clist1)
        (push NIL use1)
        (push i2 index))
      (incf i2))
    (let* ((n (length index))
           (u1 (make-array n :initial-contents (nreverse use1)))
           (ix (make-array n :initial-contents (nreverse index))))
      (declare (type simple-vector ix)
               (type simple-vector u1)
               (type fixnum n))
      (print (list u1 ix n))
      (lambda (a b)
        (declare (type simple-vector a)
                 (type simple-vector b))
        (let ((result (make-array n)))
          (dotimes (i n)
            (setf (aref result i)
                  (aref (if (aref u1 i) a b)
                        (aref ix i))))
          result)))))

runs about 13% slower than an hand-written version providing the same type declarations (2.878s instead of 2.529s for 100,000,000 calls for the (a b c d)(b d e f) case, a 6-elements output).

The inner loop for the data based closure version compiles to

; 470: L2:   4D8B540801       MOV R10, [R8+RCX+1]   ; (aref u1 i)
; 475:       4C8BF7           MOV R14, RDI          ; b
; 478:       4C8BEE           MOV R13, RSI          ; source to use (a for now)
; 47B:       4981FA17001020   CMP R10, 537919511    ; (null R10)?
; 482:       4D0F44EE         CMOVEQ R13, R14       ; if true use b instead
; 486:       4D8B540901       MOV R10, [R9+RCX+1]   ; (aref ix i)
; 48B:       4B8B441501       MOV RAX, [R13+R10+1]  ; load (aref ?? i)
; 490:       4889440B01       MOV [RBX+RCX+1], RAX  ; store (aref result i)
; 495:       4883C108         ADD RCX, 8            ; (incf i)
; 499: L3:   4839D1           CMP RCX, RDX          ; done?
; 49C:       7CD2             JL L2                 ; no, loop back

The conditional is not compiled to a jump but to a conditional assignment (CMOVEQ).

I see a little room for improvement (e.g. using CMOVEQ R13, RDI directly, saving an instruction and freeing a register) but I don't think this would shave off that 13%.

Upvotes: 0

acelent

Reputation: 8135

From what I understand, you need to precompute the vector size and the aref args.

(defun row-join-function-factory (C D)
  (flet ((add-indices (l n)
           (loop for el in l and i from 0 collect (list el n i))))
    (let* ((C-indices (add-indices C 0))
           (D-indices (add-indices D 1))
           (all-indices (append D-indices
                                (set-difference C-indices
                                                D-indices
                                                :key #'first)))
           (ns (mapcar #'second all-indices))
           (is (mapcar #'third all-indices))
           (size (length all-indices)))
      #'(lambda (A B)
          (map-into (make-array size)
                    #'(lambda (n i)
                        (aref (if (zerop n) A B) i))
                    ns is)))))

Note that I used a number to know if either A or B should be used instead of capturing C and D, to allow them to be garbage collected.

EDIT: I advise you to profile against a generated function, and observe if the overhead of the runtime closure is higher than e.g. 5%, against a special-purpose function:

(defun row-join-function-factory (C D)
  (flet ((add-indices (l n)
           (loop for el in l and i from 0 collect (list el n i))))
    (let* ((C-indices (add-indices C 0))
           (D-indices (add-indices D 1))
           (all-indices (append D-indices
                                (set-difference C-indices
                                                D-indices
                                                :key #'first)))
           (ns (mapcar #'second all-indices))
           (is (mapcar #'third all-indices))
           (size (length all-indices))
           (j 0))
      (compile
       nil
       `(lambda (A B)
          (let ((result (make-array ,size)))
            ,@(mapcar #'(lambda (n i)
                          `(setf (aref result ,(1- (incf j)))
                                 (aref ,(if (zerop n) 'A 'B) ,i)))
                      ns is)
            result))))))

And validate if the compilation overhead indeed pays off in your implementation.

I argue that if the runtime difference between the closure and the compiled lambda is really small, keep the closure, for:

A cleaner coding style
Depending on the implementation, it might be easier to debug
Depending on the implementation, the generated closures will share the function code (e.g. closure template function)
It won't require a runtime license that includes the compiler in some commercial implementations

Upvotes: 2

Paul Nathan

Reputation: 40319

Depends on when you know the data. If all the data is known at compile time, you can use a macro (per sds's answer).

If the data is known at run-time, you should be looking at loading it into an 2D array from your existing arrays. This - using a properly optimizing compiler - should imply that a lookup is several muls, an add, and a dereference.

By the way, can you describe your project in a wee bit more detail? It sounds interesting. :-)

Upvotes: 0

sds

Reputation: 60054

I think the right approach is to have a macro which would compute the indexes at compile time:

(defmacro my-array-generator (syms-a syms-b)
  (let ((table '((a 0) (b 0) (b 1) (b 2)))) ; compute this from syms-a and syms-b
    `(lambda (a b)
       (make-array ,(length table) :initial-contents
             (list ,@(mapcar (lambda (ai) (cons 'aref ai)) table))))))

And it will produce what you want:

(macroexpand '(my-array-generator ...))
==>
#'(LAMBDA (A B)
    (MAKE-ARRAY 4 :INITIAL-CONTENTS
                (LIST (AREF A 0) (AREF B 0) (AREF B 1) (AREF B 2))))

So, all that is left is to write a function which will produce

((a 0) (b 0) (b 1) (b 2))

given

syms-a = (num1 num2 num3)

and

syms-b = (num2 num3 num4)

Upvotes: 1

Possible to do this without using eval in Common Lisp?

Answers (4)

EDIT

Related Questions