I'm new to C from many years of Matlab for numerical programming. I've developed a program to solve a large system of differential equations, but I'm pretty sure I've done something stupid as, after profiling the code, I was surprised to see three loops that were taking ~90% of the computation time, despite the fact they are performing the most trivial steps of the program. My question is in three parts based on these expensive loops: Initialization of an array to zero. When J is declared to be a double array are the values of the array initialized to zero? If not, is there a fast way to set all the elements to zero? void spam(){ double J[151][151]; /* Other relevant variables declared */ calcJac(data,J,y); /* Use J */ } static void calcJac(UserData data, double J[151][151],N_Vector y) { /* The first expensive loop */ int iter, jter; for (iter=0; iter<151; iter++) { for (jter = 0; jter<151; jter++) { J[iter][jter] = 0; } } /* More code to populate J from data and y that runs very quickly */ } During the course of solving I need to solve matrix equations defined by P = I - gamma*J. The construction of P is taking longer than solving the system of equations it defines, so something I'm doing is likely in error. In the relatively slow loop below, is accessing a matrix that is contained in a structure 'data' the the slow component or is it something else about the loop? for (iter = 1; iter<151; iter++) { for(jter = 1; jter<151; jter++){ P[iter-1][jter-1] = - gamma*(data->J[iter][jter]); } } Is there a best practice for matrix multiplication? In the loop below, Ith(v,iter) is a macro for getting the iter-th component of a vector held in the N_Vector structure 'v' (a data type used by the Sundials solvers). Particularly, is there a best way to get the dot product between v and the rows of J? Jv_scratch = 0; int iter, jter; for (iter=1; iter<151; iter++) { for (jter=1; jter<151; jter++) { Jv_scratch += J[iter][jter]*Ith(v,jter); } Ith(Jv,iter) = Jv_scratch; Jv_scratch = 0; }

Reputation: 2855

Optimizing C loops

I'm new to C from many years of Matlab for numerical programming. I've developed a program to solve a large system of differential equations, but I'm pretty sure I've done something stupid as, after profiling the code, I was surprised to see three loops that were taking ~90% of the computation time, despite the fact they are performing the most trivial steps of the program.

My question is in three parts based on these expensive loops:

Initialization of an array to zero. When J is declared to be a double array are the values of the array initialized to zero? If not, is there a fast way to set all the elements to zero?

void spam(){
    double J[151][151];    
    /* Other relevant variables declared */
    calcJac(data,J,y);
    /* Use J */
}

static void calcJac(UserData data, double J[151][151],N_Vector y)
{
    /* The first expensive loop */
    int iter, jter;
    for (iter=0; iter<151; iter++) {
        for (jter = 0; jter<151; jter++) {
            J[iter][jter] = 0;
        }
    }
   /* More code to populate J from data and y that runs very quickly */
}

During the course of solving I need to solve matrix equations defined by P = I - gamma*J. The construction of P is taking longer than solving the system of equations it defines, so something I'm doing is likely in error. In the relatively slow loop below, is accessing a matrix that is contained in a structure 'data' the the slow component or is it something else about the loop?
```
for (iter = 1; iter<151; iter++) {
    for(jter = 1; jter<151; jter++){
        P[iter-1][jter-1] = - gamma*(data->J[iter][jter]);
    }
}
```
Is there a best practice for matrix multiplication? In the loop below, Ith(v,iter) is a macro for getting the iter-th component of a vector held in the N_Vector structure 'v' (a data type used by the Sundials solvers). Particularly, is there a best way to get the dot product between v and the rows of J?
```
Jv_scratch = 0;
int iter, jter;
for (iter=1; iter<151; iter++) {
    for (jter=1; jter<151; jter++) {
        Jv_scratch += J[iter][jter]*Ith(v,jter);
    }
    Ith(Jv,iter) = Jv_scratch;
    Jv_scratch = 0;
}
```

Upvotes: 0

Answers (4)

Lundin

Reputation: 215115

Initialization of an array to zero. When J is declared to be a double array are the values of the array initialized to zero? If not, is there a fast way to set all the elements to zero?

It depends on where the array is allocated. If it is declared at file scope, or as static, then the C standard guarantees that all elements are set to zero. The same is guaranteed if you set the first element to a value upon initialization, ie:

double J[151][151] = {0}; /* set first element to zero */

By setting the first element to something, the C standard guarantees that all other elements in the array are set to zero, as if the array were statically allocated.

Practically for this specific case, I very much doubt it will be wise to allocate 151*151*sizeof(double) bytes on the stack no matter which system you are using. You will likely have to allocate it dynamically, and then none of the above matters. You must then use memset() to set all bytes to zero.

In the relatively slow loop below, is accessing a matrix that is contained in a structure 'data' the the slow component or is it something else about the loop?

You should ensure that the function called from it is inlined. Otherwise there isn't much else you can do to optimize the loop: what is optimal is highly system-dependent (ie how the physical cache memories are built). It is best to leave such optimization to the compiler.

You could of course obfuscate the code with manual optimization things such as counting down towards zero rather than up, or to use ++i rather than i++ etc etc. But the compiler really should be able to handle such things for you.

As for matrix addition, I don't know of the mathematically most efficient way, but I suspect it is of minor relevance to the efficiency of the code. The big time thief here is the double type. Unless you really have need for high accuracy, I'd consider using float or int to speed up the algorithm.

Upvotes: 0

Oliver Charlesworth

Reputation: 272762

Others have already answered some of your questions. On the subject of matrix multiplication; it is difficult to write a fast algorithm for this, unless you know a lot about cache architecture and so on (the slowness will be caused by the order that you access array elements causes thousands of cache misses).

You can try Googling for terms like "matrix-multiplication", "cache", "blocking" if you want to learn about the techniques used in fast libraries. But my advice is to just use a pre-existing maths library if performance is key.

Upvotes: 1

Goz

Reputation: 62333

1) No they're not you can memset the array as follows:

memset( J, 0, sizeof( double ) * 151 * 151 );

or you can use an array initialiser:

double J[151][151] = { 0.0 };

2) Well you are using a fairly complex calculation to calculate the position of P and the position of J.

You may well get better performance. by stepping through as pointers:

for (iter = 1; iter<151; iter++) 
{
    double* pP = (P - 1) + (151 * iter);
    double* pJ = data->J + (151 * iter);

    for(jter = 1; jter<151; jter++, pP++, pJ++ )
    {
         *pP = - gamma * *pJ;
    }
}

This way you move various of the array index calculation outside of the loop.

3) The best practice is to try and move as many calculations out of the loop as possible. Much like I did on the loop above.

Upvotes: 4

bdonlan

Reputation: 231443

First, I'd advise you to split up your question into three separate questions. It's hard to answer all three; I, for example, have not worked much with numerical analysis, so I'll only answer the first one.

First, variables on the stack are not initialized for you. But there are faster ways to initialize them. In your case I'd advise using memset:

static void calcJac(UserData data, double J[151][151],N_Vector y)
{
   memset((void*)J, 0, sizeof(double) * 151 * 151);
   /* More code to populate J from data and y that runs very quickly */
}

memset is a fast library routine to fill a region of memory with a specific pattern of bytes. It just so happens that setting all bytes of a double to zero sets the double to zero, so take advantage of your library's fast routines (which will likely be written in assembler to take advantage of things like SSE).

Upvotes: 3

Optimizing C loops

Answers (4)

Related Questions