rwolst
rwolst

Reputation: 13672

Efficiently Calculate Frequency Averaged Periodogram Using GPU

In Matlab I am looking for a way to most efficiently calculate a frequency averaged periodogram on a GPU.

I understand that the most important thing is to minimise for loops and use the already built in GPU functions. However my code still feels relatively unoptimised and I was wondering what changes I can make to it to gain a better speed up.

r = 5;  % Dimension
n = 100;  % Time points
m = 20;  % Bandwidth of smoothing

% Generate some random rxn data 
X = rand(r, n);

% Generate normalised weights according to a cos window
w = cos(pi * (-m/2:m/2)/m);
w = w/sum(w);

% Generate non-smoothed Periodogram
FT = (n)^(-0.5)*(ctranspose(fft(ctranspose(X))));
Pdgm = zeros(r, r, n/2 + 1);
for j = 1:n/2 + 1
    Pdgm(:,:,j) = FT(:,j)*FT(:,j)';
end

% Finally smooth with our weights
SmPdgm = zeros(r, r, n/2 + 1);

% Take advantage of the GPU filter function
% Create new Periodogram WrapPdgm with m/2 values wrapped around in front and 
% behind it (it seems like there is redundancy here)
WrapPdgm = zeros(r,r,n/2 + 1 + m);
WrapPdgm(:,:,m/2+1:n/2+m/2+1) = Pdgm;
WrapPdgm(:,:,1:m/2) = flip(Pdgm(:,:,2:m/2+1),3);
WrapPdgm(:,:,n/2+m/2+2:end) = flip(Pdgm(:,:,n/2-m/2+1:end-1),3);

% Perform filtering
for i = 1:r
    for j = 1:r
        temp = filter(w, [1], WrapPdgm(i,j,:));
        SmPdgm(i,j,:) = temp(:,:,m+1:end);
    end
end

In particular, I couldn't see a way to optimise out the for loop when calculating the initial Pdgm from the Fourier transformed data and I feel the trick I play with the WrapPdgm in order to take advantage of filter() on the GPU feels unnecessary if there were a smooth function instead.

Upvotes: 1

Views: 163

Answers (3)

Divakar
Divakar

Reputation: 221584

Solution Code

This seems to be pretty efficient as benchmark runtimes in the next section might convince us -

%// Select the portion of FT to be processed and 
%// send copy to GPU for calculating everything
gFT = gpuArray(FT(:,1:n/2 + 1));

%// Perform non-smoothed Periodogram, thus removing the first loop
Pdgm1 = bsxfun(@times,permute(gFT,[1 3 2]),permute(conj(gFT),[3 1 2]));

%// Generate WrapPdgm right on GPU
WrapPdgm1 = zeros(r,r,n/2 + 1 + m,'gpuArray');
WrapPdgm1(:,:,m/2+1:n/2+m/2+1) = Pdgm1;
WrapPdgm1(:,:,1:m/2) = Pdgm1(:,:,m/2+1:-1:2);
WrapPdgm1(:,:,n/2+m/2+2:end) = Pdgm1(:,:,end-1:-1:n/2-m/2+1);

%// Perform filtering on GPU and get the final output, SmPdgm1
filt_data = filter(w,1,reshape(WrapPdgm1,r*r,[]),[],2);
SmPdgm1 = gather(reshape(filt_data(:,m+1:end),r,r,[]));

Benchmarking

Benchmarking Code

%// Input parameters
r = 50; % Dimension
n = 1000;  % Time points
m = 200;  % Bandwidth of smoothing

% Generate some random rxn data
X = rand(r, n);

% Generate normalised weights according to a cos window
w = cos(pi * (-m/2:m/2)/m);
w = w/sum(w);

% Generate non-smoothed Periodogram
FT = (n)^(-0.5)*(ctranspose(fft(ctranspose(X))));

tic,    %// ... Code from original approach,   toc
tic     %// ... Code from proposed approach,   toc

Runtime results thus obtained on GPU, GTX 750 Ti against CPU, I-7 4790K -

------------------------------ With Original Approach on CPU
Elapsed time is 0.279816 seconds.
------------------------------ With Proposed Approach on GPU
Elapsed time is 0.169969 seconds.

Upvotes: 1

Edric
Edric

Reputation: 25140

You can use pagefun on the GPU for the first loop. (Note that the implementation of cellfun is basically a hidden loop, whereas pagefun runs natively on the GPU using a batched GEMM operation). Here's how:

n = 16;
r = 8;
X = gpuArray.rand(r, n);
R = gpuArray.zeros(r, r, n/2 + 1);
for jj = 1:(n/2+1)
    R(:,:,jj) = X(:,jj) * X(:,jj)';
end
X2 = X(:,1:(n/2+1));
R2 = pagefun(@mtimes, reshape(X2, r, 1, []), reshape(X2, 1, r, []));
R - R2

Upvotes: 0

TallBrianL
TallBrianL

Reputation: 1252

To get rid of the first loop you can do the following:

Pdgm_cell = cellfun(@(x) x * x', mat2cell(FT(:, 1 : 51), [5], ones(51, 1)), 'UniformOutput', false);
Pdgm = reshape(cell2mat(Pdgm_cell),5,5,[]);

Then in your filter you can do the following:

temp = filter(w, 1, WrapPdgm, [], 3);
SmPdgm = temp(:, :, m + 1 : end);

The 3 lets the filter know to operate along the 3rd dimension of your data.

Upvotes: 0

Related Questions