Why is VecDeque slower than a Vec?

Question

I'm beginning to optimize performance of a crate, and I swapped out a Vec for a VecDeque. The container maintains elements in sorted order (it's supposed to be fairly small, so I didn't yet bother trying a heap) and is occasionally split down the middle into two separate containers (another reason I haven't yet tried a heap) with drain.

I'd expect this second operation to be much faster: I can copy the first half of the collection out, then simply rotate and decrease the length of the original (now second) collection. However, when I run my #[bench] tests, performing the above operations a variable number of times, (below in million ns/iters) I observed a performance decrease with the VecDeque:

	test a	test b	test c	test d
Vec	12.6	5.9	5.9	3.8
VecDeque	13.6	8.9	7.3	5.8

A reproducible example (gist):

#![feature(test)]
extern crate test;

use std::collections::VecDeque;

fn insert_in_sorted_order_vec(t: &mut Vec, k: E) {
    match t.binary_search(&k) {
        Ok(i) => t[i] = k,
        Err(i) => t.insert(i, k),
    }
}

fn insert_in_sorted_order_vecdeque(t: &mut VecDeque, k: E) {
    match t.binary_search(&k) {
        Ok(i) => t[i] = k,
        Err(i) => t.insert(i, k),
    }
}

fn split_vec(mut t: Vec) -> (Vec, Vec) {
    let a = t.drain(0..(t.len() / 2)).collect();
    (a, t)
}

fn split_vecdeque(mut t: VecDeque) -> (VecDeque, VecDeque) {
    let a = t.drain(0..(t.len() / 2)).collect();
    (a, t)
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    static ITERS_BEFORE_SPLIT: u32 = 50;
    static ITERS_TIME: u32 = 10_000;

    #[bench]
    fn vec_manip(b: &mut Bencher) {
        b.iter(|| {
            let mut v = Vec::new();
            for i in 0..(ITERS_TIME / ITERS_BEFORE_SPLIT) {
                for j in 1..(ITERS_BEFORE_SPLIT + 1) {
                    insert_in_sorted_order_vec(&mut v, i * j / (i + j)); // 'random'-ish illustrative number
                }

                v = split_vec(v).1;
            }
        });
    }

    #[bench]
    fn vecdeque_manip(b: &mut Bencher) {
        b.iter(|| {
            let mut v = VecDeque::new();
            for i in 0..(ITERS_TIME / ITERS_BEFORE_SPLIT) {
                for j in 1..(ITERS_BEFORE_SPLIT + 1) {
                    insert_in_sorted_order_vecdeque(&mut v, i * j / (i + j)); // 'random'-ish illustrative number
                }

                v = split_vecdeque(v).1;
            }
        });
    }
}

The Vec implementation took 69.2k ns/iter, and the VecDeque implementation took 91.8k.

I've repeated and verified these results a number of times - why is it that performance decreases with this more flexible data structure?

These results were obtained by running cargo bench.

Linux 5.11
3900X (12 cores, 3.8-4.6 GHz)
32GB 3200 MHz RAM
rustc 1.55.0-nightly
default cargo bench options (optimized, no debug symbols as far as I can tell, etc.)

Edit

I changed the split_vecdeque method to utilize split_off instead of drain().collect() (see below). It looks like this method is guaranteed to not reallocate or shift anything around, instead just moving the head and tail pointers around; see the documentation and implementation. That, however, performs even worse than the original VecDeque at 98.2k ns/iter. For larger values (ITERS_BEFORE_SPLIT = 50_000, ITERS_TIME = 5_000_000), though performance (21.8m ns/iter) is better than drain (23.1 ns/iter) and worse than Vec (19.1 ns/iter).

fn split_vecdeque(mut t: VecDeque) -> (VecDeque, VecDeque) {
    let a = t.split_off(t.len() / 2);
    (t, a)
}

Peter Hall · Accepted Answer

A VecDeque is like a Vec but supports pushing and popping from both ends efficiently. In order to do this, it uses a single, contiguous buffer (just like a Vec), but treats it as two partitions; a head and a tail.

The structure is laid out like this:

pub struct VecDeque {
    tail: usize,
    head: usize,
    buf: RawVec,
}

Items in the buffer are ordered like this:

[[tail: 5, 6, 7] ...unused... [head: 1, 2, 3, 4]]

Adding an item to the end of the collection will append to the the tail, using some of the unused space. Adding to the start of the collection will add to the start of the head, eating into the same space. When the head and tail meet in the middle, the VecDeque is full and will need to reallocate.

Compared with Vec:

pub struct Vec {
    buf: RawVec,
    len: usize,
}

Which uses its buffer like this:

[1, 2, 4, 5, 6, 7 ...unused...]

Adding an item at the end is fast, but adding an item at the start requires copying all of the existing items to make space.

Most operations on VecDeque are made more complicated by this layout and this will slightly reduce its performance. Even retrieving its length is more complicated:

pub fn len(&self) -> usize {
    count(self.tail, self.head, self.cap())
}

The whole point of VecDeque is to make certain operations faster, namely pushing and popping the start of the collection. Vec is very slow at this, especially if there are a lot of items, because it involves moving all of the other items to make space. The structure of VecDeque makes these operations fast but at the expense of performance of other operations in comparison to Vec.

Your tests doesn't appear to take advantage of VecDeque's design, since they are dominated by calls to insert, which involves the expensive copying of many items in both cases.

Why is VecDeque slower than a Vec?

Edit

Answers (1)

Related Questions