skinhat
skinhat

Reputation: 9

Speed up multiplying two arrays for convolution

I've got an application and Ive done some speed profiling on my code and the limiting factor is the multiplication of two arrays. I've got one array X as follows (in my real code is it a 2000 elements long not 15) :

a b c d e f g h i j k l m n o

and a second array Y (in my real code it is 300 elements long not 6) as follows :

A B C D E F

I then multiply the arrays as follows to create a new array C:

value 1  A*a+B*b+C*c+D*d+E*e+F*f
value 2  A*b+B*c+C*d+D*e+E*f+F*g
value 3  A*c+B*d+C*e+D*f+E*g+F*h
value 4  A*d+B*e+C*f+D*g+E*h+F*i
value 5  A*e+B*f+C*g+D*h+E*i+F*j
value 6  A*f+B*g+C*h+D*i+E*j+F*k
value 7  A*g+B*h+C*i+D*j+E*k+F*l
value 8  A*h+B*i+C*j+D*k+E*l+F*m
value 9  A*i+B*j+C*k+D*l+E*m+F*n
value 10 A*j+B*k+C*l+D*m+E*n+F*o

I was wondering if there was a way to speed this code up. I currently just use the code:

for (int l=0; l<X.length; l++) {
    val=0;
    for (int i=0; i<X.length; i++)
        for (int j=0; j<J.length; j++)
            val+=X[i]*Y[j];
    C[l]=val;
}

I though I could maybe use Fourier transforms but it is slower than multiplying it out.

Upvotes: 0

Views: 466

Answers (1)

Tam&#225;s Zahola
Tam&#225;s Zahola

Reputation: 9311

I would recommend using a BLAS implementation, which is the de facto standard scientific library for performing basic vector/matrix operations. There's probably a highly optimized version of BLAS available for your architecture of choice, using SIMD instructions (if there are any) and taking care of pipeline saturation to achieve maximum throughput. For example: OS X and iOS ships with a BLAS implementation in the Accelerate.framework.

(Please also note, that your code doesn't do convolution. val only depends on i and j, so each element of C will be the same: the sum of the products of each pair from X and J. )

Upvotes: 0

Related Questions