cs95
cs95

Reputation: 402493

Performing complicated matrix manipulation operations with cblas_sgemm in order to carry out multiplication

I have 100 3x3x3 matrices that I would like to multiply with another large matrix of size 3x5x5 (similar to convolving one image with multiple filters, but not quite).

For the sake of explanation, this is what my large matrix looks like:

>>> x = np.arange(75).reshape(3, 5, 5)
>>> x
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]],

       [[25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64],
        [65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74]]])

In memory, I assume all sub matrices in the large matrix are stored in contiguous locations (please correct me if I'm wrong). What I want to do is, from this 3x5x5 matrix, I want to extract 3 5x3 columns from each sub-matrix of the large matrix and then join them horizontally to get a 5x9 matrix (I apologise if this part is not clear, I can explain in more detail if need be). If I were using numpy, I'd do:

>>> k = np.hstack(np.vstack(x)[:, 0:3].reshape(3, 5, 3))
>>> k
array([[ 0,  1,  2, 25, 26, 27, 50, 51, 52],
       [ 5,  6,  7, 30, 31, 32, 55, 56, 57],
       [10, 11, 12, 35, 36, 37, 60, 61, 62],
       [15, 16, 17, 40, 41, 42, 65, 66, 67],
       [20, 21, 22, 45, 46, 47, 70, 71, 72]])

However, I'm not using python so I do not have any access to the numpy functions that I need in order to reshape the data blocks into a form I want to carry out multiplication... I can only directly call the cblas_sgemm function (from the BLAS library) in C, where k corresponds to input B.

Here's my call to cblas_sgemm:

    cblas_sgemm(    CblasRowMajor, CblasNoTrans, CblasTrans, 
                    100, 5, 9, 
                    1.0,
                    A, 9,
                    B, 9, // this is actually wrong, since I don't know how to specify the right parameter
                    0.0,
                    result, 5);

Basically, the ldb attribute is the offender here, because my data is not blocked the way I need it to be. I have tried different things, but I am not able to get cblas_sgemm to understand how I want it to read and understand my data.

In short, I don't know how to tell cblas_sgemm to read x like k.Is there a way I can smartly reshape my data in python before sending it to C, so that cblas_sgemm can work the way I want it to?

I will transpose k by setting CblasTrans, so during multiplication, B is 9x5. My matrix A is of shape 100x9. Hope that helps.

Any help would be appreciated. Thanks!

Upvotes: 0

Views: 273

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114831

In short, I don't know how to tell cblas_sgemm to read x like k.

You can't. You'll have to make a copy.

Consider k:

In [20]: k
Out[20]: 
array([[ 0,  1,  2, 25, 26, 27, 50, 51, 52],
       [ 5,  6,  7, 30, 31, 32, 55, 56, 57],
       [10, 11, 12, 35, 36, 37, 60, 61, 62],
       [15, 16, 17, 40, 41, 42, 65, 66, 67],
       [20, 21, 22, 45, 46, 47, 70, 71, 72]])

In a two-dimensional array, the spacing of the elements in memory must be the same in each axis. You know from how x was created that the consecutive elements in memory are 0, 1, 2, 3, 4, ..., but your first row of k contains 0, 1, 2, 25, 26, ..... The is no spacing between 1 and 2 (i.e. the memory address increases by the size of one element of the array), but there is a large jump in memory between 2 and 25. So you'll have to make a copy to create k.


Having said that, there is an alternative method to efficiently achieve your desired final result using a bit of reshaping (without copying) and numpy's einsum function.

Here's an example. First define x and A:

In [52]: x = np.arange(75).reshape(3, 5, 5)

In [53]: A = np.arange(90).reshape(10, 9)

Here's my understanding of what you want to achieve; A.dot(k.T) is the desired result:

In [54]: k = np.hstack(np.vstack(x)[:, 0:3].reshape(3, 5, 3))

In [55]: A.dot(k.T)
Out[55]: 
array([[ 1392,  1572,  1752,  1932,  2112],
       [ 3498,  4083,  4668,  5253,  5838],
       [ 5604,  6594,  7584,  8574,  9564],
       [ 7710,  9105, 10500, 11895, 13290],
       [ 9816, 11616, 13416, 15216, 17016],
       [11922, 14127, 16332, 18537, 20742],
       [14028, 16638, 19248, 21858, 24468],
       [16134, 19149, 22164, 25179, 28194],
       [18240, 21660, 25080, 28500, 31920],
       [20346, 24171, 27996, 31821, 35646]])

Here's how you can get the same result by slicing x and reshaping A:

In [56]: x2 = x[:,:,:3]

In [57]: A2 = A.reshape(-1, 3, 3)

In [58]: einsum('ijk,jlk', A2, x2)
Out[58]: 
array([[ 1392,  1572,  1752,  1932,  2112],
       [ 3498,  4083,  4668,  5253,  5838],
       [ 5604,  6594,  7584,  8574,  9564],
       [ 7710,  9105, 10500, 11895, 13290],
       [ 9816, 11616, 13416, 15216, 17016],
       [11922, 14127, 16332, 18537, 20742],
       [14028, 16638, 19248, 21858, 24468],
       [16134, 19149, 22164, 25179, 28194],
       [18240, 21660, 25080, 28500, 31920],
       [20346, 24171, 27996, 31821, 35646]])

Upvotes: 1

Related Questions