Reputation: 33
I'm writing templated matrix class, and I get stack overflows when returning by value from operators: +,-,* for larger matrices. I would prefer to somehow return by reference to relieve the stack and to avoid extra copying, but then, I would have to return an object constructed with new and break the general rule of "using delete for every new". I can't return by value because of copying overhead and stack limit problems, and I'm also unable to return by reference because of memory leaks, so what I should do then?
Here's my product function (Matrix contains 2D array elems):
template<typename T, unsigned int n, unsigned int m> template<unsigned int m2>
Matrix<T,n,m2> Matrix<T,n,m>::operator*(Matrix<T,m,m2>& M) {
T prod[n][m2];
if(n*m < GPUAccelerationThreshold)
for(int i = 0; i < n; i++)
for(int j = 0; j < m2; j++) {
prod[i][j] = elems[i][0] * M(0, j);
for(int p = 1; p < m; p++)
prod[i][j] += elems[i][p] * M(p, j);
}
else {
array_view<T, 2> product(n, m2, *prod);
array_view<T, 2> a(n, m, *elems);
array_view<T, 2> b(m, m2, M.elems[0]);
parallel_for_each(
product.extent,
[=](index<2> idx) restrict(amp) {
int row = idx[0];
int col = idx[1];
for (int inner = 0; inner < m; inner++) {
product[idx] += a(row, inner) * b(inner, col);
}
}
);
product.synchronize();
}
return Matrix<T,n,m2>(prod);
}
I'm writing this class because I want to boost some matrix operations on GPU(with MS amp). I've searched for an existing solution, found GPU accelerated linear algebra libs, but what I couldn't find in them was a simple matrix class with +,-,* operators. Maybe someone could recommend me any?
Upvotes: 3
Views: 1069
Reputation: 2240
Actually i can't fully understand your idea... Binary operators takes two arguments and create the result. In fact you can see that you are returning a new created object. So it's the only way to write programs: to allocate memory, to use it, and then delete it. In fact I even don't understand what your constructor does. If it simply copies the pointer to "prod" then the result matrix is broken when you return it from function because the "prod" memory will be deleted as soon as function returns(because it's created on a stack). So you can't return it by reference too.
The way i see the solution in to allocate memory in the matrix constructor. If you are making it as a template depending on matrix size, the size of the matrix is known from template parameters (I find it quite strange to make templates with matrix size as argument.. What is the point of that?). So you allocate memory in constructor with "new" and delete it in destructor with "delete". So this memory will be allocated according to RAII methodology that works pretty well in OOP. Then you implement methods such as setElement(i, j, value) and set the elements in new created matrix in your binary operator and return it.
There are some problems however that i want you to take care of. The copy constructor must really copy the matrix not just a pointer(or several destructors will try to destroy the same memory) or you can program the "lazy copy" model that actually copies the matrix on changing (see wiki). Or you can make the copy constructor private without realization (to prevent copying at all). If you are not allowed to create such methods as "setElement" because you don't want the user of your library to change matrix values, you can access to private data and methods (even in objects that we got as an argument or newly created) in such operators because you are inside the class method.
If you need to pass raw pointer to other calculation function as it's done in "else" part, you can make a constructor from a pointer that will copy just a pointer (but this is dangerous way. If you pass a pointer you must not access it nowhere because the matrix class is now the boss) or copy data totally(wich is slower but you can pass there pointers from stack or pointers you need to take control after) depending on your wish, and destructor will clean it when matrix destroys. Or you even can create private method such as "getRawMatrix()" which will return the raw pointer to data from matrix and pass this pointer to your calculation functions or even simply get the raw data pointer because you are inside a method of the matrix class.
Usually you allocate memory in a constructor and make "lazy copy" model because matrices can be huge. Inside the class is allowed to access the private data and members, that's the classes are made. I hope it will be usefull..
Upvotes: 1
Reputation: 153977
Three quick comments:
Matrix
classes have used dynamic
allocation. You don't show your Matrix
class, but if your
data is:
T myData[n][m];you might want to change it to:
std::vector myData;, initializing it to the size
n * m
in the constructor, and
calculating the single index in the operator[]
(which should
return a proxy if you want to do any bounds checking).
Alternatively, you can use operator()( int i, int j )
for
accessing an element: whether myMatrix( i, j )
or
myMatrix[i][j]
is prefered for access depends on who you ask.
Although this solution increases total memory use slightly (but
very slightly), it reduces the stack footprint to a couple of
dozen bytes, regardless of the size of the matrix.
T
have the
same type. You can thus access the internals of the matrix you
return from your member function, and you no longer need the
intermediate T prod[n][m2]
. Of course, you could make all
instantiations of Matrix
friend, or simply use the access
functions to set the values. At any rate, you do not want an
intermediate T prod[n][m2]
; this not only requires a lot of on
stack memory, it means that you'll have to copy the results.
operator*
does not return a matrix, but a helper
class, along the lines of:
template
class MatrixMultiply
{
L const* myLhs;
R const* myRhs;
public:
typedef T value_type;
MatrixMultiply( L const& lhs, R const& rhs )
: myLhs( &lhs )
, myRhs( &rhs )
{
}
int getX() const
{
return myLhs->getX();
}
int getY() const
{
return myRhs->getY();
}
T get( int i, int j ) const
{
return calculateIJ( myLhs, myRhs );
}
};
You then provide a templated constructor and assignment operator
which uses getX()
, getY()
and get( i, j )
. Your
operator*
is also a template, which returns a
MatrixMultiply
:
template
MatrixMultiply
operator*( L const& lhs, R const& rhs )
{
return MatrixMultiply( lhs, rhs );
}
(Note that if L::value_type
and R::value_type
aren't
identical, this won't compile. Which is what you want, except
that the error messages will be far from clear.)
The result is that you never actually build the intermediate,
temporary matrices.
As you can imagine, the above solution is greatly simplified.
You'll need additional code for error handling, and I don't
think that the parallelization is trivial. But it avoids the
construction of all intermediate matrices, even in complicated
expressions.
(The same technique can be used using an abstract base class,
say MatrixAccessor
, with pure virtual getters, and deriving
Matrix
and all of the helpers like MatrixMultiply
from
it. IMHO, this is a lot more readable, and the error messages
from the compiler will definitely be more understandable. The results will be the
same as long as the compiler actually inlines all of the member
functions. But that's a big if, since there can be
significant function nesting.)
Upvotes: 3
Reputation: 129454
There is no easy way to solve this. You can't return stack local variables as reference, because the memory "behind" the variable will go away when you return. So you have to have some dedicated storage somewhere. It doesn't HAVE to come from new/delete, but you do need to have some sort of storage when making copies of the data.
One solution would of course to have three-operand operation, so instead of:
a = b + c;
you use a function:
add(a, b, c);
where a, b and c are references.
It does make the code a lot messier, but I can't think of any more obvious way to solve the problem - if you don't want to write your own allocator/delete function (or a garbage collector).
Upvotes: 1