Nerwena
Nerwena

Reputation: 23

Should there be pointers?

I'm trying to implement the code shown in this pdf. More precisely (page 50):

#define SM (CLS / sizeof (double))

for (i = 0; i < N; i += SM)
  for (j = 0; j < N; j += SM)
    for (k = 0; k < N; k += SM)
      for (i2 = 0, rres = &res[i][j],
           rmul1 = &mul1[i][k]; i2 < SM;
           ++i2, rres += N, rmul1 += N)
         for (k2 = 0, rmul2 = &mul2[k][j];
              k2 < SM; ++k2, rmul2 += N)
           for (j2 = 0; j2 < SM; ++j2)
              rres[j2] += rmul1[k2] * rmul2[j2];

Now, as far as I'm concerned, rres is int*, rmul2 and rmul1 too.

Should it look like

int *rres;
int *rmul1;
int *rmul2;

for (i = 0; i < N; i += SM)
   for (j = 0; j < N; j += SM)
      for (k = 0; k < N; k += SM)
         for (i2 = 0, *rres = &res[i][j],
              *rmul1 = &mul1[i][k]; i2 < SM;
              ++i2, rres += N, rmul1 += N)
            for (k2 = 0, *rmul2 = &mul2[k][j];
                 k2 < SM; ++k2, rmul2 += N)
               for (j2 = 0; j2 < SM; ++j2)
                  rres[j2] += rmul1[k2] * rmul2[j2];

Because this seems more or less reasonable to me bit gives wrong results. For example, if I have two matrices 2 x 2, randoms values 0 or 1, I get:

-1520010527 23996350 
212687419 207125308

which is far from good. My guess is that I use * wrong but I can't tell where...

EDIT: Declaration of res:

  int **res = (int **)malloc(N * sizeof(int *));
  for (int i = 0; i < N; i++) {
      res[i] = (int *)malloc(N * sizeof(int));
  }
  for (int i = 0; i < N; i++) {
      for (int j = 0; j < N; j++) {
          res[i][j] = 0;
      }
  }

Upvotes: 2

Views: 122

Answers (1)

chqrlie
chqrlie

Reputation: 144695

Your assumption about the type of rres, rmul2 and rmul1 seems incorrect: the matrix elements probably have double type, and the expressions in the initial clauses of the for loops should definitely not have an initial *.

Yet this matrix multiplication code should work the same for any element type, but the cache optimisation explained by Ulrich Drepper makes another assumption: N must be a multiple of SM, and it is unlikely to be the case with #define SM (CLS / sizeof(double)) and N = 2 in your test case, which might explain your observations.

The code should be:

#define SM  (CLS / sizeof(double))

void matrix_multiply(double res[N][N], const double mul1[N][N], const double mul2[N][N]) {
    size_t i, j, k, i2, j2, k2;
    double *rres, *rmul1, *rmul2;
    
    assert(N % SM == 0);

    for (i = 0; i < N; i += SM) {
        for (j = 0; j < N; j += SM) {
            for (k = 0; k < N; k += SM) {
                for (i2 = 0, rres = &res[i][j], rmul1 = &mul1[i][k];
                     i2 < SM; ++i2, rres += N, rmul1 += N) {
                    for (k2 = 0, rmul2 = &mul2[k][j];
                         k2 < SM; ++k2, rmul2 += N) {
                        for (j2 = 0; j2 < SM; ++j2)
                            rres[j2] += rmul1[k2] * rmul2[j2];
                    }
                }
            }
        }
    }
}

Upvotes: 3

Related Questions