Reputation: 876
I want to load N-dimensional matrices from disk (HDF5) into std::vector
objects.
I know their rank beforehand, just not the shape. For instance, one of the matrices is 4-rank std::vector<std::vector<std::vector<std::vector<float>>>> data;
I want to use vectors to store the values because they are standard and not as ugly as c-arrays (mostly because they are aware of their length).
However, the way to load them is using a loading function that takes a void *
, which would work fine for rank 1 vectors where I can just resize them and then access its data pointer (vector.data()
). For higher ranks, vector.data()
will just point to vector
s, not the actual data.
Worst case scenario I just load all the data to an auxiliary c-array and then copy it manually but this could slow it down quite a bit for big matrices.
Is there a way to have contiguous multidimensional data in vectors and then get a single address to it?
Upvotes: 3
Views: 359
Reputation: 275730
Your plan is not a wise one. Vectors of vectors of vectors are inefficient and only really useful for dynamic jagged arrays, which you don't have.
Instead of your plan, load into a flst vector.
Next, wrap it with a multidimensional view.
template<class T, size_t Dim>
struct dimensional{
size_t const* strides;
T* data;
dimensional<T, Dim-1> operator[](size_t i)const{
return {strides+1, data+i* *strides};
}
};
template<class T>
struct dimensional<T,0>{
size_t const* strides; // not valid to dereference
T* data;
T& operator[](size_t i)const{
return data[i];
}
};
where strides
points at an array of array-strides for each dimension (the product of the sizes of all later dimensions).
So my_data.access()[3][5][2]
gets a specific element.
This sketch of a solution leaves everything public, and doesn't support for(:)
iteration. A more shipping quality one would have proper privacy and support c++11 style for loops.
I am unaware of the name of a high quality multi-dimensional array view already written for you, but there is almost certainly one in boost.
Upvotes: 3
Reputation: 371
For a bi-dimensional matrix, you could use an ugly c-array like that:
float data[w * h]; //width, height
data[(y * w) + x] = 0; //access (x,y) element
For a tri-dimensional matrix:
float data[w * h * d]; //width, height, depth
data[((z * h) + y) * w + x] = 0; //access (x,y,z) element
And so on. To load data from, let's say, a file,
float *data = yourProcToLoadData(); //works for any dimension
That's not very scalable but you deal with a known dimension. This way your data is contiguous and you have a single address.
Upvotes: 0
Reputation: 5714
If you are concerned about performance please don't use a vector of vector of vector... .
Here is why. I think the answer of @OldPeculier is worth reading.
The reason that it's both fat and slow is actually the same. Each "row" in the matrix is a separately allocated dynamic array. Making a heap allocation is expensive both in time and space. The allocator takes time to make the allocation, sometimes running O(n) algorithms to do it. And the allocator "pads" each of your row arrays with extra bytes for bookkeeping and alignment. That extra space costs...well...extra space. The deallocator will also take extra time when you go to deallocate the matrix, painstakingly free-ing up each individual row allocation. Gets me in a sweat just thinking about it.
There's another reason it's slow. These separate allocations tend to live in discontinuous parts of memory. One row may be at address 1,000, another at address 100,000—you get the idea. This means that when you're traversing the matrix, you're leaping through memory like a wild person. This tends to result in cache misses that vastly slow down your processing time.
So, if you absolute must have your cute [x][y] indexing syntax, use that solution. If you want quickness and smallness (and if you don't care about those, why are you working in C++?), you need a different solution.
Upvotes: 5