Andrei
Andrei

Reputation: 2665

Why are lists faster then vectors for simple access?

I know there are similar question, e.g. Arrays vs. lists in Lisp: Why are lists so much faster in the code below?, but lists seem to be faster then vectors even for simple element access, which is counter-intuitive. Here is my example:

(let ((x '(0 1 2 3 4 5 6 7 8 9)))
  (time (dotimes (i 1000000000) (setf (nth 9 x) 0))))

(let ((x #(0 1 2 3 4 5 6 7 8 9)))
  (time (dotimes (i 1000000000) (setf (aref x 9) 0))))

The first block runs almost two times faster on my machine, with SBCL, than the second block. (And if I test read access times, then both lists and vectors have almost the same time). I would expect the list to be some 10 times slower, because it needs to traverse to the end to find the 9th element. Why is list faster?

Upvotes: 3

Views: 600

Answers (2)

coredump
coredump

Reputation: 38799

I cannot reproduce your observations. Here below is what I measure, along with two other tests with different types of arrays. Note first that my setup script does the following:

(sb-ext:restrict-compiler-policy 'debug 3)
(sb-ext:restrict-compiler-policy 'safety 3)

This might have an impact on code compilation.

Your tests

(print :list)
(let ((x (list 0 1 2 3 4 5 6 7 8 9)))
  (time (dotimes (i 1000000000) (setf (nth 9 x) 0))))

(print :simple-vector)
(let ((x (vector 0 1 2 3 4 5 6 7 8 9)))
  (time (dotimes (i 1000000000) (setf (aref x 9) 0))))

The trace is as follows:

:LIST 
Evaluation took:
  6.649 seconds of real time
  6.649545 seconds of total run time (6.649545 user, 0.000000 system)
  100.02% CPU
  21,224,869,441 processor cycles
  0 bytes consed


:SIMPLE-VECTOR 
Evaluation took:
  0.717 seconds of real time
  0.716708 seconds of total run time (0.716708 user, 0.000000 system)
  100.00% CPU
  2,287,130,610 processor cycles
  0 bytes consed

Regarding your original question, there is an almost 10 speed factor (9.27) difference between using lists and vectors. I don't know why your tests show differently (edit: so apparently it was related to undefined behaviours, I saw you mutated constant data and fixed it in my code without thinking it could be related :/)

Specialized arrays

I also tried to describe the array as precisely as possible, by declaring in its type the kind of elements it holds, (mod 10), and the array dimensions, (10). The actual array element type in SBCL is (UNSIGNED-BYTE 4), which means the array is likely to packed: multiple array elements can be stored in a machine word (32 or 64bits). Packing is good for space but needs coding/decoding steps when storing/accessing elements.

(print :packed-array)
(let ((x (make-array 10
                     :element-type '(mod 10)
                     :initial-contents '(0 1 2 3 4 5 6 7 8 9))))
  (declare (type (array (mod 10) (10)) x)
           (optimize (speed 3)))
  (time (dotimes (i 1000000000)
          (setf (aref x 9) (the (mod 10) 0)))))

I also have a version where the actual array element type (unsigned-byte 32), which should avoid doing conversions:

(print :unpacked-array)
(let ((x (make-array 10
                     :element-type '(unsigned-byte 32)
                     :initial-contents '(0 1 2 3 4 5 6 7 8 9))))
  (declare (type (array (unsigned-byte 32) (10)) x)
           (optimize (speed 3)))
  (time (dotimes (i 1000000000)
          (setf (aref x 9) (the (mod 10) 0)))))

Additional tests:

:PACKED-ARRAY 
Evaluation took:
  1.168 seconds of real time
  1.167929 seconds of total run time (1.167929 user, 0.000000 system)
  100.00% CPU
  3,727,528,968 processor cycles
  0 bytes consed

Here you can see that using the most precise type actually slows the tests in that particular case. This can probably be explained by the overhead of decoding/encoding operations (?)

:UNPACKED-ARRAY 
Evaluation took:
  0.231 seconds of real time
  0.231094 seconds of total run time (0.231094 user, 0.000000 system)
  100.00% CPU
  737,633,062 processor cycles
  0 bytes consed

With arrays of 32bits values, the execution time is smaller.

If you call disassemble for tests on both kinds of vectors (just remove the call to time that adds a lot of noise, and maybe lower the debug levels), eventually the difference boils down to which type of register is used in the loop:

The simple vector uses 64 bits (RCX) because the vector must be able to store objects of type T:

; C0: L2:   B900000000       MOV ECX, 0
; C5:       48894A49         MOV [RDX+73], RCX
; C9:       4883C002         ADD RAX, 2
; CD: L3:   483D00943577     CMP RAX, 2000000000
; D3:       7CEB             JL L2

The test with 32 bits array uses the ECX (32 bits) register.

; 960: L2:   31C9             XOR ECX, ECX
; 962:       894A25           MOV [RDX+37], ECX
; 965:       4883C002         ADD RAX, 2
; 969: L3:   483D00943577     CMP RAX, 2000000000
; 96F:       7CEF             JL L2

Also, the offsets are different, but consistent with the types:

- 37 = 4 (bytes) * 9 (index) + 1 (storage for array size?)
- 73 = 8 (bytes) * 9 (index) + 1 (storage for array size?)

Upvotes: 3

Renzo
Renzo

Reputation: 27424

The strange behaviour is due to the fact that you are modifying a constant data, and this can lead to an undefined behaviour (see the manual):

The consequences are undefined if literal objects (including quoted objects) are destructively modified.

If you change the the two expressions to modify two non-literal data structures, like in:

(let ((x (list 0 1 2 3 4 5 6 7 8 9)))
  (time (dotimes (i 1000000000) (setf (nth 9 x) 0))))

(let ((x (vector 0 1 2 3 4 5 6 7 8 9)))
  (time (dotimes (i 1000000000) (setf (aref x 9) 0))))

then the results will be completely different as very well shown in the other answer.

Upvotes: 4

Related Questions