m.raynal
m.raynal

Reputation: 3113

Program still runs on one thread using par_iter and par_extend

I'm trying to parallelize a portion of my code, and despite it using rayon and the parallel iterators par_iter() and par_extend(), it still looks like it runs on a single thread.

I simply create a vector of i32, fill it up with a lot of values, and then move these values into a collections::HashSet of integers.

My single threaded code:

use std::collections::HashSet;

fn main() {
    let my_vec: Vec<i64> = (0..100_000_000).collect();

    let mut my_set: HashSet<i64> = HashSet::new();
    let st = std::time::Instant::now();
    my_set.extend(
        my_vec.iter().map(|x| x*(x+3)/77+44741)  // this is supposed to take a while to compute
    );
    let dur = st.elapsed();
    println!("{:?}", dur);

}

Running time is around 8.86 s in average. Here is the code using parallel iterators:

extern crate rayon;
use rayon::prelude::*;
use std::collections::HashSet;

fn main() {
    let my_vec: Vec<i64> = (0..100_000_000).collect();

    let mut my_set: HashSet<i64> = HashSet::new();
    let st = std::time::Instant::now();
    my_set.par_extend(
        my_vec.par_iter().map(|x| x*(x+3)/77+44741) // this is supposed to take a while to compute
    );
    let dur = st.elapsed();
    println!("{:?}", dur);
}

The average running time for the 'parallel' version is almost the same (8.62 s), and the cpu monitor clearly shows that a single cpu is working at 100% while the others just wait.

Do you know what I did wrong, or did not understand?

Upvotes: 2

Views: 1196

Answers (1)

Boiethios
Boiethios

Reputation: 42739

Your simulation is not right because your calculation is actually fast, so fast that it is faster by several orders of magnitude than a thread context switch. Your core at 100% is likely the rayon runtime, while the other cores are waiting for it.

If you actually replace your computation by a sleep, the results are as you expect:

use std::collections::HashSet;
use rayon::prelude::*; // 1.1.0
use std::time::Duration;

fn main() {
    fn slow(i: &i64) -> i64 {
        std::thread::sleep(Duration::from_millis(5));

        *i
    }

    let my_vec: Vec<i64> = (0..100).collect();

    let mut my_set: HashSet<i64> = HashSet::new();

    let st = std::time::Instant::now();
    my_set.extend(
        my_vec.iter().map(slow)  // this is supposed to take a while to compute
    );
    let dur = st.elapsed();
    println!("Regular: {:?}", dur);

    let st = std::time::Instant::now();
    my_set.par_extend(
        my_vec.par_iter().map(slow) // this is supposed to take a while to compute
    );
    let dur = st.elapsed();
    println!("Rayon: {:?}", dur);
}

Output:

Regular: 685.670791ms
Rayon: 316.733253ms

When you try to optimize your code, you must carefully benchmark it because sometimes, when you parallelize your code, that can make it slower.

Upvotes: 3

Related Questions