Reputation: 53
I'm trying to parallelize some vector functions in a struct using openMP. While it works well with most of my implementations, I find that since the constructor for std::vector<>
has a linear complexity, I can't get better performance and instead get something that's even worse than doing it sequentially for initialization.
Here's one of the initializors
/**
* @brief Construct a new constant parallel Vector object with a given value constantEntry
*
* @param dim
* @param constantEntry
*/
parallelVector(const int dim, const double constantEntry){
dimension = dim;
values = std::vector<double>(dimension);
#pragma omp parallel for schedule(static)
for (int i=0 ; i<dimension; i++){
values[i] = constantEntry;
}
}
The std::vector<>
documentation says I can get O(1) complexity using allocators, but since I'm not too familiar with them, I was wondering if something with unique pointers is possible instead?
Upvotes: 2
Views: 165
Reputation: 275405
template<class T>
struct uninitialized_allocator:std::allocator<T> {
template<class...Us>
void construct( T* p, Us&&... us ){
::new((void*)p) T(std::forward<Us>(us)...);
}
// don't construct when passed 0 arguments
void construct( T* p ) {};
};
then later:
int dimension = 0;
std::vector<double, uninitialized_allocator<double>> values;
parallelVector(const int dim, const double constantEntry):
dimension(dim),
values(dim)
{
#pragma omp parallel for schedule(static)
for (int i=0 ; i<dimension; i++){
values[i] = constantEntry;
}
}
note, however, that you become responsible for initializing the new vector elements on calls to resize
as well as well as emplace_back()
with 0 arguments.
In theory, std::vector
still calls construct(T*)
on each and every element, but that is a noop, and compilers are good at dead-code elimination. So under non-trivial optimization settings this should do what you want.
Note that I changed a use of operator=
into construction; they are not the same, and vector
is free to behave very differently in the two cases.
Upvotes: 5