Threaded DES slower than not threaded

Question

I am having trouble to get performance improvement by parallelizing a DES encryption algorithm.

Here is my attempt:

fn des(message: &[u8], subkeys: Vec) -> Vec {
    let mut pool = Pool::new(THREAD_COUNT);
    let message = message_to_u64s(message);

    crossbeam::scope(|scope| {
        pool.map(scope, message.iter().enumerate(), |(i, &block)| {
            let permuted = ip(block);
            let mut li = permuted & 0xFFFFFFFF00000000;
            let mut ri = permuted << 32;

            for subkey in &subkeys {
                let last_li = li;
                li = ri;
                ri = last_li ^ feistel(ri, *subkey);
            }

            let r16l16 = ri | (li >> 32);
            to_u8_vec(fp(r16l16))
        }).collect::>()
    }).concat()
}

(this uses the crates crossbeam and simple_parallel but I will accept solutions not using these)

Unfortunately, this implementation is slower than the version without thread:

fn des(message: &[u8], subkeys: Vec) -> Vec {
    let message = message_to_u64s(message);

    let mut cipher = vec![];

    for block in message {
        let permuted = ip(block);
        let mut li = permuted & 0xFFFFFFFF00000000;
        let mut ri = permuted << 32;

        for subkey in &subkeys {
            let last_li = li;
            li = ri;
            ri = last_li ^ feistel(ri, *subkey);
        }

        let r16l16 = ri | (li >> 32);
        let mut bytes = to_u8_vec(fp(r16l16));
        cipher.append(&mut bytes);
    }

    cipher
}

I believe the collect and concat are the issues but I don't know how to avoid them without using unsafe code.

So how can I improve the performance of this algorithm (by using threads) using safe code? (solutions with unsafe code would also be interesting, but I believe there must be a solution without unsafe code)

viraptor · Accepted Answer

Use a profiler. You could try guessing where the slowdown is, but in you may not find the right place anyway.

But for an educated guess... I'd try splitting the message into THREAD_COUNT parts and feeding those parts to the thread pool instead. If you're sending 8-byte fragments separately, you'll spend more time on managing that than on the DES itself.

Threaded DES slower than not threaded

Answers (1)

Related Questions