Cannon's algorithm of matrix multiplication

Question

I try to implement the Cannon's algorithm of matrix multiplication. I read description on the wikipedia that provides next pseudocode:

   row i of matrix a is circularly shifted by i elements to the left.
   col j of matrix b is circularly shifted by j elements up.
   Repeat n times:
       p[i][j] multiplies its two entries and adds to running total.
       circular shift each row of a 1 element left
       circular shift each col of b 1 element up

and I implemented it on the C# next way:

    public static void ShiftLeft(int[][] matrix, int i, int count)
    {
        int ind = 0;
        while (ind < count)
        {
            int temp = matrix[i][0];
            int indl = matrix[i].Length - 1;
            for (int j = 0; j < indl; j++)
                matrix[i][j] = matrix[i][j + 1];
            matrix[i][indl] = temp;
            ind++;
        }
    }
    public static void ShiftUp(int[][] matrix, int j, int count)
    {
        int ind = 0;
        while (ind < count)
        {
            int temp = matrix[0][j];
            int indl = matrix.Length - 1;
            for (int i = 0; i < indl; i++)
                matrix[i][j] = matrix[i + 1][j];
            matrix[indl][j] = temp;
            ind++;
        }
    }

    public static int[][] Cannon(int[][] A, int[][] B)
    {
        int[][] C = new int[A.Length][];
        for (int i = 0; i < C.Length; i++)
            C[i] = new int[A.Length];
        for (int i = 0; i < A.Length; i++)
            ShiftLeft(A, i, i);

        for (int i = 0; i < B.Length; i++)
            ShiftUp(B, i, i);

        for (int k = 0; k < A.Length; k++)
        {
            for (int i = 0; i < A.Length; i++)
            {
                for (int j = 0; j < B.Length; j++)
                {
                    var m = (i + j + k) % A.Length;
                    C[i][j] += A[i][m] * B[m][j];
                    ShiftLeft(A, i, 1);
                    ShiftUp(B, j, 1);
                }

            }
        };

        return C;

    }

this code return correct result, but do it very slowly. Much slowly even than naive algorithm of matrix multiplication.

For matrix 200x200 I got that result:

00:00:00.0490432 //naive algorithm
00:00:07.1397479 //Cannon's algorithm

What I am doing wrong?

Edit

Thanks SergeySlepov, it was bad attempt to do it parallel. When I back to sequential implementation I got next result:

Count       Naive               Cannon's

200    00:00:00.0492098    00:00:08.0465076

250    00:00:00.0908136    00:00:22.3891375

300    00:00:00.1477764    00:00:58.0640621

350    00:00:00.2639114    00:01:51.5545524

400    00:00:00.4323984    00:04:50.7260942

okay, it's not a parallel implementation, but how can I do it correctly?

Cannon's algorithm of matrix multiplication

Edit

Answers (1)

Related Questions

Cannon&#39;s algorithm of matrix multiplication

Edit

Answers (1)

Related Questions

Cannon's algorithm of matrix multiplication