Reputation: 3113
I'm trying to parallelize a portion of my code, and despite it using rayon
and the parallel iterators par_iter()
and par_extend()
, it still looks like it runs on a single thread.
I simply create a vector of i32
, fill it up with a lot of values, and then move these values into a collections::HashSet
of integers.
My single threaded code:
use std::collections::HashSet;
fn main() {
let my_vec: Vec<i64> = (0..100_000_000).collect();
let mut my_set: HashSet<i64> = HashSet::new();
let st = std::time::Instant::now();
my_set.extend(
my_vec.iter().map(|x| x*(x+3)/77+44741) // this is supposed to take a while to compute
);
let dur = st.elapsed();
println!("{:?}", dur);
}
Running time is around 8.86 s
in average.
Here is the code using parallel iterators:
extern crate rayon;
use rayon::prelude::*;
use std::collections::HashSet;
fn main() {
let my_vec: Vec<i64> = (0..100_000_000).collect();
let mut my_set: HashSet<i64> = HashSet::new();
let st = std::time::Instant::now();
my_set.par_extend(
my_vec.par_iter().map(|x| x*(x+3)/77+44741) // this is supposed to take a while to compute
);
let dur = st.elapsed();
println!("{:?}", dur);
}
The average running time for the 'parallel' version is almost the same (8.62 s
), and the cpu monitor clearly shows that a single cpu is working at 100% while the others just wait.
Do you know what I did wrong, or did not understand?
Upvotes: 2
Views: 1196
Reputation: 42739
Your simulation is not right because your calculation is actually fast, so fast that it is faster by several orders of magnitude than a thread context switch. Your core at 100% is likely the rayon runtime, while the other cores are waiting for it.
If you actually replace your computation by a sleep, the results are as you expect:
use std::collections::HashSet;
use rayon::prelude::*; // 1.1.0
use std::time::Duration;
fn main() {
fn slow(i: &i64) -> i64 {
std::thread::sleep(Duration::from_millis(5));
*i
}
let my_vec: Vec<i64> = (0..100).collect();
let mut my_set: HashSet<i64> = HashSet::new();
let st = std::time::Instant::now();
my_set.extend(
my_vec.iter().map(slow) // this is supposed to take a while to compute
);
let dur = st.elapsed();
println!("Regular: {:?}", dur);
let st = std::time::Instant::now();
my_set.par_extend(
my_vec.par_iter().map(slow) // this is supposed to take a while to compute
);
let dur = st.elapsed();
println!("Rayon: {:?}", dur);
}
Output:
Regular: 685.670791ms
Rayon: 316.733253ms
When you try to optimize your code, you must carefully benchmark it because sometimes, when you parallelize your code, that can make it slower.
Upvotes: 3