Reputation: 6231
Below is my C wrapper for a Fortran ZHEEVR routine from well-known LAPACK numerical library:
void zheevr(char jobz, char range, char uplo, int n, doublecomplex* a, int lda, double vl, double vu, int il, int iu, double abstol, double* w, doublecomplex* z, int ldz, int* info)
{
int m;
int lwork = -1;
int liwork = -1;
int lrwork = -1;
int* isuppz = alloc_memory(sizeof(int) * 2 * n);
zheevr_(&jobz, &range, &uplo, &n, a, &lda, &vl, &vu, &il, &iu, &abstol, &m, w, z, &ldz, isuppz, small_work_doublecomplex, &lwork, small_work_double, &lrwork, small_work_int, &liwork, &info);
lwork = (int) small_work_doublecomplex[0].real;
liwork = small_work_int[0];
lrwork = (int) small_work_double[0];
doublecomplex* work = alloc_memory(sizeof(doublecomplex) * lwork);
double* rwork = alloc_memory(sizeof(double) * lrwork);
int* iwork = alloc_memory(sizeof(int) * liwork);
zheevr_(&jobz, &range, &uplo, &n, a, &lda, &vl, &vu, &il, &iu, &abstol, &m, w, z, &ldz, isuppz, work, &lwork, rwork, &lrwork, iwork, &liwork, info);
free(iwork);
free(rwork);
free(work);
free(isuppz);
}
This function is called hundreds of thousands of times in my application, to diagonalize the complex matrix "a" (parameter names follow the Fortran convention for this function) for the same matrix size. I think that the work arrays sizes will be the same most of the time, as the diagonalized matrices will be of the same structure. My questions are:
Upvotes: 1
Views: 899
Reputation: 754490
If both answers are 'yes', consider using VLA's - variable length arrays:
void zheevr(char jobz, char range, char uplo, int n, doublecomplex* a, int lda, double vl, double vu, int il, int iu, double abstol, double* w, doublecomplex* z, int ldz, int* info)
{
int m;
int lwork = -1;
int liwork = -1;
int lrwork = -1;
int isuppz[2*n];
zheevr_(&jobz, &range, &uplo, &n, a, &lda, &vl, &vu, &il, &iu, &abstol, &m, w, z, &ldz, isuppz, small_work_doublecomplex, &lwork, small_work_double, &lrwork, small_work_int, &liwork, &info);
lwork = (int) small_work_doublecomplex[0].real;
liwork = small_work_int[0];
lrwork = (int) small_work_double[0];
doublecomplex work[lwork];
double rwork[lrwork];
int iwork[liwork];
zheevr_(&jobz, &range, &uplo, &n, a, &lda, &vl, &vu, &il, &iu, &abstol, &m, w, z, &ldz, isuppz, work, &lwork, rwork, &lrwork, iwork, &liwork, info);
}
One nice thing about using VLAs is that there is no freeing to be done by you.
(Untested code!)
Upvotes: 5
Reputation: 10184
If you are allocating the same size item hundreds of thousands of times, then why not just maintain a heap of your objects (since these seem to be relatively simple, i.e. don't contain pointers to other allocated memory) and free onto your own heap (or stack actually)?
The heap can lazily allocate new objects using the glib malloc, but when freeing just push the item onto the heap. When you need to allocate, if there is a freed object available it can just allocate that one.
This will also save you multiple calls to allocation (since you won't need to do any allocation and it looks like your routine makes several calls to malloc) and will also avoid fragmentation (to some extent) at least on the re-used memory. Of course the initial allocations (and other allocations as the program is running when it needs to expand this memory) may cause fragmentation, but if you are really worried about this you can run some stats and find the average/max/typical size of your heap during runs and pre-allocate this at once when the program starts up, avoiding fragmentation.
Upvotes: 1
Reputation: 1591
1) Yes they can.
2) Any sane libc shouldn't worry about order of free(). Performance wise that shouldn't matter too.
I'd recommend removing memory management from this function -- so caller will be supplying the matrix size and allocated temporary buffers. That'll cut number of mallocs significantly if this function is called from same place on matrix of the same size.
Upvotes: 5
Reputation: 900
Alright. You are going to get the profiler answer anytime soon. If you have an AMD machine, I strongly recommend the free AMD's CodeAnalyst.
As for your memory problem, I think that you could work with local memory management in this case. Just determine the maximum number of memory that you can allocate for this function. Next you declare a static buffer and you work with it a bit like how a compiler handles the stack. I did a wrapper like this over VirtualAlloc once and it's VERY fast.
Upvotes: 1
Reputation:
It certainly will affect performance - how much youb can only find out for sure by timing. To create a version that avoids most allocations, allocate to a static pointer and remember the size in another static integer. If the next call uses the same size, just reuse what was allocated last time. Only free anything when you need to create a new matrix because the size has changed.
Note this solution is only suitable for single-threaded code.
Upvotes: 2