Reputation: 23
I'm trying to implement the code shown in this pdf. More precisely (page 50):
#define SM (CLS / sizeof (double))
for (i = 0; i < N; i += SM)
for (j = 0; j < N; j += SM)
for (k = 0; k < N; k += SM)
for (i2 = 0, rres = &res[i][j],
rmul1 = &mul1[i][k]; i2 < SM;
++i2, rres += N, rmul1 += N)
for (k2 = 0, rmul2 = &mul2[k][j];
k2 < SM; ++k2, rmul2 += N)
for (j2 = 0; j2 < SM; ++j2)
rres[j2] += rmul1[k2] * rmul2[j2];
Now, as far as I'm concerned, rres
is int*
, rmul2
and rmul1
too.
Should it look like
int *rres;
int *rmul1;
int *rmul2;
for (i = 0; i < N; i += SM)
for (j = 0; j < N; j += SM)
for (k = 0; k < N; k += SM)
for (i2 = 0, *rres = &res[i][j],
*rmul1 = &mul1[i][k]; i2 < SM;
++i2, rres += N, rmul1 += N)
for (k2 = 0, *rmul2 = &mul2[k][j];
k2 < SM; ++k2, rmul2 += N)
for (j2 = 0; j2 < SM; ++j2)
rres[j2] += rmul1[k2] * rmul2[j2];
Because this seems more or less reasonable to me bit gives wrong results. For example, if I have two matrices 2 x 2, randoms values 0 or 1, I get:
-1520010527 23996350
212687419 207125308
which is far from good. My guess is that I use *
wrong but I can't tell where...
EDIT:
Declaration of res
:
int **res = (int **)malloc(N * sizeof(int *));
for (int i = 0; i < N; i++) {
res[i] = (int *)malloc(N * sizeof(int));
}
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
res[i][j] = 0;
}
}
Upvotes: 2
Views: 122
Reputation: 144695
Your assumption about the type of rres
, rmul2
and rmul1
seems incorrect: the matrix elements probably have double
type, and the expressions in the initial clauses of the for
loops should definitely not have an initial *
.
Yet this matrix multiplication code should work the same for any element type, but the cache optimisation explained by Ulrich Drepper makes another assumption: N
must be a multiple of SM
, and it is unlikely to be the case with #define SM (CLS / sizeof(double))
and N = 2
in your test case, which might explain your observations.
The code should be:
#define SM (CLS / sizeof(double))
void matrix_multiply(double res[N][N], const double mul1[N][N], const double mul2[N][N]) {
size_t i, j, k, i2, j2, k2;
double *rres, *rmul1, *rmul2;
assert(N % SM == 0);
for (i = 0; i < N; i += SM) {
for (j = 0; j < N; j += SM) {
for (k = 0; k < N; k += SM) {
for (i2 = 0, rres = &res[i][j], rmul1 = &mul1[i][k];
i2 < SM; ++i2, rres += N, rmul1 += N) {
for (k2 = 0, rmul2 = &mul2[k][j];
k2 < SM; ++k2, rmul2 += N) {
for (j2 = 0; j2 < SM; ++j2)
rres[j2] += rmul1[k2] * rmul2[j2];
}
}
}
}
}
}
Upvotes: 3