Reputation: 57
I have a randomly defined H
matrix of size 600 x 10
. Each element in this matrix H
can be represented as H(k,t)
. I obtained a speech spectrogram S
which is 600 x 597
. I obtained it using Mel features, so it should be 40 x 611
but then I used a frame stacking concept in which I stacked 15 frames together. Therefore it gave me (40x15) x (611-15+1)
which is 600 x 597
.
Now I want to obtain an output matrix Y
which is given by the equation based on convolution Y(k,t) = ∑ H(k,τ)S(k,t-τ)
. The sum goes from τ=0
to τ=Lh-1
. Lh
in this case would be 597.
I don't know how to obtain Y
. Also, my doubt is the indexing into both H
and S
when computing the convolution. Specifically, for Y(1,1)
, we have:
Y(1,1) = H(1,0)S(1,1) + H(1,1)S(1,0) + H(1,2)S(1,-1) + H(1,3)S(1,-2) + ...
Now, there is no such thing as negative indices in MATLAB - for example, S(1,-1) S(1,-2)
and so on. So, what type of convolution should I use to obtain Y
? I tried using conv2
or fftfilt
but I think that will not give me Y
because Y
must also be the size of S
.
Upvotes: 2
Views: 2115
Reputation: 104464
That's very easy. That's a convolution on a 2D signal only being applied to 1 dimension. If we assume that the variable k
is used to access the rows and t
is used to access the columns, you can consider each row of H
and S
as separate signals where each row of S
is a 1D signal and each row of H
is a convolution kernel.
There are two ways you can approach this problem.
If you want to stick with time domain, the easiest thing would be to loop over each row of the output, find the convolution of each pair of rows of S
and H
and store the output in the corresponding output row. From what I can tell, there is no utility that can convolve in one dimension only given an N-D signal.... unless you go into frequency domain stuff, but let's leave that for later.
Something like:
Y = zeros(size(S));
for idx = 1 : size(Y,1)
Y(idx,:) = conv(S(idx,:), H(idx,:), 'same');
end
For each row of the output, we perform a row-wise convolution with a row of S
and a row of H
. I use the 'same'
flag because the output should be the same size as a row of S
... which is the bigger row.
You can also perform the same computation in frequency domain. If you know anything about the properties of convolution and the Fourier Transform, you know that convolution in time domain is multiplication in the frequency domain. You take the Fourier Transform of both signals, multiply them element-wise, then take the Inverse Fourier Transform back.
However, you need to keep the following intricacies in mind:
Performing a full convolution means that the final length of the output signal is length(A)+length(B)-1
, assuming A
and B
are 1D signals. Therefore, you need to make sure that both A
and B
are zero-padded so that they both match the same size. The reason why you make sure that the signals are the same size is to allow for the multiplication operation to work.
Once you multiply the signals in the frequency domain then take the inverse, you will see that each row of Y
is the full length of the convolution. To ensure that you get an output that is the same size as the input, you need to trim off some points at the beginning and at the end. Specifically, since each kernel / column length of H
is 10, you would have to remove the first 5 and last 5 points of each signal in the output to match what you get in the for
loop code.
Usually after the inverse Fourier Transform, there are some residual complex coefficients due to the nature of the FFT algorithm. It's good practice to use real
to remove the complex valued parts of the results.
Putting all of this theory together, this is what the code would look like:
%// Define zero-padded H and S matrices
%// Rows are the same, but columns must be padded to match point #1
H2 = zeros(size(H,1), size(H,2)+size(S,2)-1);
S2 = zeros(size(S,1), size(H,2)+size(S,2)-1);
%// Place H and S at the beginning and leave the rest of the columns zero
H2(:,1:size(H,2)) = H;
S2(:,1:size(S,2)) = S;
%// Perform Fourier Transform on each row separately of padded matrices
Hfft = fft(H2, [], 2);
Sfft = fft(S2, [], 2);
%// Perform convolution
Yfft = Hfft .* Sfft;
%// Take inverse Fourier Transform and convert to real
Y2 = real(ifft(Yfft, [], 2));
%// Trim off unnecessary values
Y2 = Y2(:,size(H,2)/2 + 1 : end - size(H,2)/2 + 1);
Y2
should be the convolved result and should match Y
in the previous for
loop code.
If you actually want to compare them, we can. What we'll need to do first is define H
and S
. To reconstruct what I did, I generated random values with a known seed:
rng(123);
H = rand(600,10);
S = rand(600,597);
Once we run the above code for both the time domain version and frequency domain version, let's see how they match up in the command prompt. Let's show the first 5 rows and 5 columns:
>> format long g;
>> Y(1:5,1:5)
ans =
1.63740867892464 1.94924208172753 2.38365646354643 2.05455605619097 2.21772526557861
2.04478411247085 2.15915645246324 2.13672842742653 2.07661341840867 2.61567534623066
0.987777477630861 1.3969752201781 2.46239452105228 3.07699790208937 3.04588738611503
1.36555260994797 1.48506871890027 1.69896157726456 1.82433906982894 1.62526864072424
1.52085236885395 2.53506897420001 2.36780282057747 2.22335617436888 3.04025523335182
>> Y2(1:5,1:5)
ans =
1.63740867892464 1.94924208172753 2.38365646354643 2.05455605619097 2.21772526557861
2.04478411247085 2.15915645246324 2.13672842742653 2.07661341840867 2.61567534623066
0.987777477630861 1.3969752201781 2.46239452105228 3.07699790208937 3.04588738611503
1.36555260994797 1.48506871890027 1.69896157726456 1.82433906982894 1.62526864072424
1.52085236885395 2.53506897420001 2.36780282057747 2.22335617436888 3.04025523335182
Looks good to me! As another measure, let's figure out what the largest difference is between one value in Y
and a corresponding value in Y2
:
>> max(abs(Y(:) - Y2(:)))
ans =
5.32907051820075e-15
That's saying that the max error seen between both outputs is in the order of 10-15. I'd say that's pretty good.
Upvotes: 3