Reputation:
This Common Lisp function, which simply computes four vertices of a wall's wireframe edges with extremely simple kindergarten-level arithmetic and a few 'case' tests appears to be responsible for dynamically allocating 196608 bytes per rendered frame; SBCL's profiler tells me this is my most problematic function as far as consing goes. To give a general idea of what I am working on, it is a small first-person dungeon crawler game, and a dungeon is exactly 32x32 cells, and each cell has 4 walls. 32 * 32 * 4 * x = 196608, and so x turns out to be 48, which so happens to be 4 * 12 (4 walls * 12 floats per wall? Maybe not).
Now, I can easily mitigate this performance problem by using OpenGL display lists within gameplay mode, and I suppose that is what I will go ahead and do. Still, 1) I generally don't prematurely optimize, and more importantly 2) I still don't like leaving certain bothersome itches like this un-scratched, and I wonder what else I could have done. My function as-is, is as follows:
(defun calculate-wall-points (x y wall)
(declare (integer x y)
(keyword wall))
"Return the 4 vertices (12 floats) of a given dungeon cell wall"
(let ((xf (coerce x 'float))
(yf (coerce y 'float)))
(case wall
(:SOUTH
(values xf yf 0.0
(1+ xf) yf 0.0
(1+ xf) yf 1.0
xf yf 1.0))
(:WEST
(values xf yf 0.0
xf yf 1.0
xf (1+ yf) 1.0
xf (1+ yf) 0.0))
(:NORTH
(values xf (1+ yf) 0.0
xf (1+ yf) 1.0
(1+ xf) (1+ yf) 1.0
(1+ xf) (1+ yf) 0.0))
(:EAST
(values (1+ xf) (1+ yf) 0.0
(1+ xf) (1+ yf) 1.0
(1+ xf) yf 1.0
(1+ xf) yf 0.0))
(otherwise
(error "Not a valid heading passed for wall in function calculate-wall-points: ~A" wall)))))
To summarize a couple of things I have tried to fix this:
Doing a 'declare' to 'optimize' for 'speed' at 3 and everything else at 0 (both in this function, as well as the only function that calls it). Strangely, the profiler did report this function consing slightly less... but it still consed. I'm aiming for zero-consing. Arithmetic should not cons.
Then I thought 'values' might be doing this. Perhaps, I thought, it's internally just something like the function 'list', which, without a doubt, conses (the 'list' function's only purpose in the universe). What did I do to try to mitigate this? Just for experiment's sake, I modified the file to make a single wall-vertex-buffer global array, sized to fit 12 elements of type float, and modified both this function to modify it, and the calling function to read from it after calling this function (so it would constantly be updating one single set of 12 floats kept in the same place in memory, instead of allocating anything). Strangely, this did NOT stop this function from being a consing-piggy! So... was 'case' doing the consing? I do find it interesting that, mentioned earlier, that mystery number was 48. 48 = 4 * 12, perhaps those 4 case tests times 12 floats per 'values' call. Or, that could be a coincidence, with 48 bytes meaning something else (since a float is NOT 1 byte, I suspect is -is- something else). This seems significant, but I can't quite wrap my head around what my next approach should be.
Tried replacing 'case' with a 'cond' equivalent, just grasping for straws at this point, did not do anything either.
So where could this function's "mystery consing" be coming from? How would you more experienced Lisp programmers approach this tricky gremlin of a problem?
(EDIT) for @FaheemMitha, is the function that used the calculate-wall-points function; that troublesome function was later inlined with (declaim (inline calculate-wall-points)) just before calculate-wall-points's definition:
(defun render-dungeon-room (dungeon-object x y)
(declare (optimize (speed 3) (space 0) (debug 0)))
(declare (type fixnum x y))
(let ((cell (cell-at dungeon-object x y)))
(unless (null cell)
(dolist (wall-heading +basic-headings+)
(unless (eq wall-heading (opposite-heading *active-player-heading*))
(when (eql (get-wall-type cell wall-heading) :NORMAL)
(multiple-value-bind (v1x v1y v1z v2x v2y v2z v3x v3y v3z v4x v4y v4z)
(calculate-wall-points x y wall-heading)
(declare (type float v1x v1y v1z v2x v2y v2z v3x v3y v3z v4x v4y v4z))
(gl:with-primitive :quads
(if (is-edit-mode)
(case wall-heading
(:NORTH
(gl:color 0.4 0.4 0.4))
(:WEST
(gl:color 0.4 0.0 0.0))
(:SOUTH
(gl:color 0.0 0.0 0.4))
(:EAST
(gl:color 0.0 0.4 0.0)))
(gl:color 0.1 0.1 0.1))
(gl:vertex (the float v1x)
(the float v1y)
(the float v1z))
(gl:vertex (the float v2x)
(the float v2y)
(the float v2z))
(gl:vertex (the float v3x)
(the float v3y)
(the float v3z))
(gl:vertex (the float v4x)
(the float v4y)
(the float v4z)))
(gl:color 1.0 1.0 1.0)
(gl:with-primitive :line-loop
(gl:vertex (the float v1x)
(the float v1y)
(the float v1z))
(gl:vertex (the float v2x)
(the float v2y)
(the float v2z))
(gl:vertex (the float v3x)
(the float v3y)
(the float v3z))
(gl:vertex (the float v4x)
(the float v4y)
(the float v4z)))))))))
nil)
Upvotes: 7
Views: 717
Reputation: 139251
The consed memory is caused by allocating the floats. Each function call returns floats, actually 32 bit single-floats
. Consing means that some data is allocated on the heap: cons cells, numbers, arrays, ...
A single-float
is 32bit memory object. 4 bytes.
(+ 1.0 2.0) -> 3.0
In above case 3.0
is a new float, possibly newly consed.
(+ (+ 1.0 2.0) 4.0) -> 7.0)
Now what above the calculation above? The inner +
operation returns a float 3.0
. What happens to it?
Now what happens with these floats later? Are they stored somehow? In a list? In a new array? In a new structure
? In a new CLOS object?
Above makes it clear that it depends on the processor architecture and the compiler strategy. A x86 has not many registers. The 64bit version has more. A RISC processor may have even more registers. Then how large is the stack and how large are typical stack frames?
For more complex calculations involving several functions an optimizing compiler might be able to optimize which values stay in registers and thus reduce consing.
Above also makes it clear that for Common Lisp there is no fully general recipe how to make float operations non-consing. The ability to reduce consing depends on some general ideas and a lot of compiler/architecture specific tricks.
Since you are using SBCL, it is best to ask for advice on the SBCL mailing list and also tell them about OS, architecture (intel, arm, ...) and if it is running in 32bit or 64bit mode. More context code is also needed to come up with a better picture how to get consing reduced.
Some background information for reading:
Upvotes: 8
Reputation:
What does the compiler say? If you optimize for speed it should complain loudly about not being able to open-code arithmetic.
Next, what is happening with the coercing? Is this open-coded too?
Finally, remember that you can usually inspect the assembly-code a function generates with disassemble()
Upvotes: 0