Laray
Laray

Reputation: 171

Traverse of multidimensional Array in any axis

I have a (kind of) performance problem in my code, that roots in the chosen architecture.

I will use multidimensional tensors (basically matrices with more dimensions) in the form of cubes to store my data. Since the dimension is not known at compile-time, I can't use Boost's MultidimensionalArray (IIRC), but have to come up, with my own solution.

Right now, I save each dimension, on it's own. I have a Tensor of dimension (let's say 3), that holds a lot of tensors of dimension 2 (in an std::vector), that each have a std::vector with tensors of dimension 1, that each holds a std::vector of (numerical) data. I use an abstract base-class for my tensor, so everything in there is a pointer to the abstract class, while beeing (secretly) multi- or one-dimensional.

I extract a single numerical data-point by giving a std::list of indices to a tensor, that get's the first element, searches for the according tensor and passes the rest of the list to that tensor in a (kind of) recursive call.

I now have to do a multi-dimensional Fast-Fourier Transformation on that data. I use a Threadpool and Job-Objects, that works on copying data from an Tensor along one dimension, doing an FFT and writes that data back.

I already have logic to implement ThreadPool and organize the dimensions to FFT along, but there is one problem:

My data-structure is the cache-unfriendliest beast, one can think of... While the Data-Copying along the first dimension (that, with it's data in a single 1D-Tensor) is reasonable fast, but in other directions, I need to copy my data from all over the place.

Since there are no race-conditions (I make sure every concurrent FFT is on distinct data-points), I thought, I would not use a Mutex-Guard to let everybody copy at the same time. However this heavily slows down the process ("I copy my data now!" - "No, I copy my data now!"- "But it's my turn now!"...)

Guarding the copy-Process with a mutex, does not increase speed. The FFT of a vector with 1024 elements is way faster, then the copy-process to get these elements, resulting in nearly all of my threads waiting, while one is copying.

Long story short: Is there any kind of multi-dimensional data-structure, that does not need to set the dimension at compile-time, that allows me to traverse fast along all axis? I searched for a while now, by nothing came up besides Boost MultiArray. Vectorization also does not work since the indices would grow too fast to hold in usual int-types.


I can't think of how to present code-examples here, since most of that code is rather simple, but If needed, I can get that in.

Upvotes: 2

Views: 300

Answers (1)

mdavezac
mdavezac

Reputation: 515

Eigen has multi-dimensional tensor support (nominally unsupported, but written by the DeepMind people, so "somewhat" supported?), and FFTW has 1d to 3d FFTs. Using external libraries with a set of 1D to 3D FFTs would outsource most of the hard work.

Edit: Actually, FFTW has support for threaded n-dimensional FFTs

Upvotes: 2

Related Questions