Reputation: 2843
I was thinking about using Rayon's parallel iterator feature, but I'm concerned about performance for iterating over small collections.
Parallelism overhead sometimes can cause a slowdown on small collections. Iterating over 2 elements is slower if I do the necessary preparations for multi-threading than if I used a single-threaded version. If I have 40 million elements, parallelism will give me a linear performance improvement.
I read about ParallelIterator::weight
(0.6.0), but I don't understand if I should optimize such corner cases for small collections or if Rayon is smart and handles everything under the under the hood.
if collection_is_small() {
// Run single threaded version...
} else {
// Use parallel iterator.
}
The ParallelIterator::weight
of the processed element is 1. See relevant documentation for good definition, but processing of a single element is cheap.
Google sent me to an old documentation page. Weight
was deprecated and removed since version 0.8.0.
Upvotes: 4
Views: 2726
Reputation: 2843
Weight API was deprecated in favor of split length control. By default Rayon will split at every item, effectively making all computation parallel, this behavior can be configured via with_min_len.
Sets the minimum length of iterators desired to process in each thread. Rayon will not split any smaller than this length, but of course an iterator could already be smaller to begin with.
Producers like zip and interleave will use greater of the two minimums. Chained iterators and iterators inside flat_map may each use their own minimum length.
extern crate rayon; // 1.0.3
use rayon::prelude::*;
use std::thread;
fn main() {
println!("Main thread: {:?}", thread::current().id());
let ids: Vec<_> = (0..4)
.into_par_iter()
.with_min_len(4)
.map(|_| thread::current().id())
.collect();
println!("Iterations: {:?}", ids);
}
Output:
Main thread: ThreadId(0)
Iterations: [ThreadId(0), ThreadId(0), ThreadId(0), ThreadId(0)]
Playground (thanks to @shepmaster for code)
Upvotes: 3
Reputation: 430506
You can empirically see that such a behavior is not guaranteed:
use rayon::prelude::*; // 1.0.3
use std::thread;
fn main() {
let ids: Vec<_> = (0..2)
.into_par_iter()
.map(|_| thread::current().id())
.collect();
println!("{:?}", ids);
}
Various runs of the program show:
[ThreadId(1), ThreadId(2)]
[ThreadId(1), ThreadId(1)]
[ThreadId(2), ThreadId(1)]
[ThreadId(2), ThreadId(2)]
That being said, you should perform your own benchmarking. By default, Rayon creates a global threadpool and uses work stealing to balance the work between the threads. The threadpool is a one-time setup cost per process and work-stealing helps ensure that work only crosses thread boundaries when needed. This is why there are outputs above where both use the same thread.
Upvotes: 0