Reputation: 11
There are some parts of my program, which import a file that contains two matrices and multiply them.
But I am confused with why the duration time with concurrency is longer than that without concurrency?
Are there any bugs in my code?
// without concurrency
let mut result = vec![];
let time1 = Instant::now();
for i in 0..n {
let mut temp_vector = vec![];
for j in 0..n {
let mut temp_num = 0;
for multiple_count in 0..m {
temp_num = temp_num + arr1[i][multiple_count] * arr2[multiple_count][j];
}
temp_vector.push(temp_num);
}
result.push(temp_vector);
}
let time2 = Instant::now();
println!("normal solving result:\n");
for i in 0..n {
for j in 0..n {
print!("{:?} ", result[i][j]);
}
println!("");
}
let pass = time2.duration_since(time1);
println!("{:?}\n",pass);
println!("concurrency solving solution:\n");
// start the concurrency
let mut handles = vec![];
let arr1 = Arc::new(RwLock::new(arr1));
let arr2 = Arc::new(RwLock::new(arr2));
let count_time1 = Instant::now();
for i in 0..n {
for j in 0..n {
let arr1 = arr1.clone();
let arr2 = arr2.clone();
let handle = thread::spawn(move || {
let mut count = 0;
let arr1 = arr1.try_read().unwrap();
let arr2 = arr2.try_read().unwrap();
for k in 0..m {
count = count + arr1[i][k] * arr2[k][j];
}
count
});
handles.push(handle);
}
}
let count_time2 = Instant::now();
let pass_time = count_time2.duration_since(count_time1);
Upvotes: 1
Views: 594
Reputation: 22196
There are several reasons, but the few that jump out:
You could probably get a pretty significant speed-up by ditching the RWLock (you don't need it since your data is read-only), since the Arc is only delaying the startup time, and the time it takes a thread to join (since it needs to drop the Arc). However, by far your biggest speedup is going to be only spawning 4-8 threads depending on your processor. I'll leave it to you how to best split it into chunks, but it's fairly straightforward.
Edit: In fact, you can probably get rid of the Arc
too, since the threads immediately join, but depending on Rust thread lifetime weirdness, you may need the crossbeam::scoped
functionality from the crossbeam
crate to actually make it work.
As an aside, once you move to concurrent writing to the same data structures, I highly encourage you to look up info on the processor cache, specifically, false sharing. While Mutexes are likely to be the higher cost in Rust, if you can somehow eschew them (e.g. by splitting a slice with split_mut
), you'll likely get bad flailing by constantly invalidating your cache around the boundaries.
Upvotes: 4
Reputation: 88526
Without looking into too much detail: you are spawning n² threads -- one for each cell in the result matrix. Spawning threads is expensive (note that Rust doesn't use "green thread", but system threads by default).
Concurrency doesn't simply speed up everything; one has to be a bit smart about it. Usually you just want to utilize all CPU cores, hence you should only spawn roughly as many threads as there are cores. In your case, spawning the thread probably takes much more time than what the thread does, thus the slowdown.
Upvotes: 4