FateOri
FateOri

Reputation: 79

Cannon's algorithm of matrix multiplication

I try to implement the Cannon's algorithm of matrix multiplication. I read description on the wikipedia that provides next pseudocode:

   row i of matrix a is circularly shifted by i elements to the left.
   col j of matrix b is circularly shifted by j elements up.
   Repeat n times:
       p[i][j] multiplies its two entries and adds to running total.
       circular shift each row of a 1 element left
       circular shift each col of b 1 element up

and I implemented it on the C# next way:

    public static void ShiftLeft(int[][] matrix, int i, int count)
    {
        int ind = 0;
        while (ind < count)
        {
            int temp = matrix[i][0];
            int indl = matrix[i].Length - 1;
            for (int j = 0; j < indl; j++)
                matrix[i][j] = matrix[i][j + 1];
            matrix[i][indl] = temp;
            ind++;
        }
    }
    public static void ShiftUp(int[][] matrix, int j, int count)
    {
        int ind = 0;
        while (ind < count)
        {
            int temp = matrix[0][j];
            int indl = matrix.Length - 1;
            for (int i = 0; i < indl; i++)
                matrix[i][j] = matrix[i + 1][j];
            matrix[indl][j] = temp;
            ind++;
        }
    }

    public static int[][] Cannon(int[][] A, int[][] B)
    {
        int[][] C = new int[A.Length][];
        for (int i = 0; i < C.Length; i++)
            C[i] = new int[A.Length];
        for (int i = 0; i < A.Length; i++)
            ShiftLeft(A, i, i);

        for (int i = 0; i < B.Length; i++)
            ShiftUp(B, i, i);

        for (int k = 0; k < A.Length; k++)
        {
            for (int i = 0; i < A.Length; i++)
            {
                for (int j = 0; j < B.Length; j++)
                {
                    var m = (i + j + k) % A.Length;
                    C[i][j] += A[i][m] * B[m][j];
                    ShiftLeft(A, i, 1);
                    ShiftUp(B, j, 1);
                }

            }
        };

        return C;

    }

this code return correct result, but do it very slowly. Much slowly even than naive algorithm of matrix multiplication.

For matrix 200x200 I got that result:

00:00:00.0490432 //naive algorithm
00:00:07.1397479 //Cannon's algorithm

What I am doing wrong?


Edit

Thanks SergeySlepov, it was bad attempt to do it parallel. When I back to sequential implementation I got next result:

Count       Naive               Cannon's

200    00:00:00.0492098    00:00:08.0465076

250    00:00:00.0908136    00:00:22.3891375

300    00:00:00.1477764    00:00:58.0640621

350    00:00:00.2639114    00:01:51.5545524

400    00:00:00.4323984    00:04:50.7260942

okay, it's not a parallel implementation, but how can I do it correctly?

Upvotes: 1

Views: 2905

Answers (1)

Sergey Slepov
Sergey Slepov

Reputation: 2111

Cannon's algorithm was built for a 'Distributed Memory Machine' (a grid of processors, each with its own memory). This is very different to the hardware you're running it on (a few processors with shared memory) and that is why you're not seeing any increase in performance.

The 'circular shifts' in the pseudocode that you quoted actually mimic data transfers between processors. After the initial matrix 'skewing', each processor in the grid keeps track of three numbers (a, b and c) and executes pseudocode similar to this:

c += a * b;
pass 'a' to the processor to your left (wrapping around)
pass 'b' to the processor to 'above' you (wrapping around)
wait for the next iteration of k

We could mimic this behaviour on a PC using NxN threads but the overhead of context switching (or spawning Tasks) would kill all the joy. To make the most of a PC's 4 (or so) CPUs we could make the loop over i parallel. The loop over k needs to be sequential (unlike your solution), otherwise you might face racing conditions as each iteration of k modifies the matrices A, B and C. In a 'distributed memory machine' race conditions are not a problem as processors do not share any memory.

Upvotes: 1

Related Questions