greenasds
greenasds

Reputation: 43

Optimizing two-dimensional interval partitioning for multi-dimensional matrix data

There is a four-dimensional matrix data, along with corresponding X coordinates data_X and Y coordinates data_Y for each element. I want to perform a two-dimensional interval segmentation, or grid partitioning, based on edge_X and edge_Y, resulting in a new array data_class. The current loop implementation is very slow. How can I improve the computational speed?

clc;clear
rng("default")
data = randi([0, 100], 40,30,20, 10);
data_X =  randi([0, 100], 40,30,20, 10);
data_Y =  randi([0, 100], 40,30,20, 10);

% Define the edges for histograms in X and Y directions
edge_X = [0:10:100];
edge_Y = [0:20:100];

sz = size(data);
for ix = 1:sz(1)
    for iy = 1:sz(2)
        for ii = 1:sz(3)
            for ij = 1:sz(4)
               % Extract the current data, X, and Y values
                g_data = data(ix, iy, ii, ij);
                g_X = data_X(ix, iy, ii, ij);
                g_Y = data_Y(ix, iy, ii, ij);
    
                % Compute the 2D histogram count  
                g_id = histcounts2(g_X, g_Y, edge_X, edge_Y);
                g_new = g_id * g_data;
                data_class  (ix,iy,ii,ij,:,:) = g_new; 
            end
        end
    end
end

Since each element of data is processed individually, g_data will only be noted as 1 in a bin, and then

g_new = g_id * g_data; 

g_new is the position of g_data after reclassification

The current loop implementation is very slow. How can I improve the computational speed?

Upvotes: 1

Views: 39

Answers (1)

Wolfie
Wolfie

Reputation: 30101

The first thing you should be doing is looking at the orange underlined warning in your editor about data_class growing with each loop iteration. Initialising it before all of your loops with this saves a large portion of the runtime immediately

Pre-allocate data_class before the loops:

szout = [sz, [numel(edge_X),numel(edge_Y)]-1];
data_class = NaN(szout);

But the main bottleneck is calling histcounts2 on every value individually, you can make large time savings by bringing it outside of the ii/ij loops, and instead of using the first input as a count of 1 for the bin it's in, use the 4th and 5th outputs which store the bin each value should go into. In your case this is the row and column indices.

Here are some timings:

  • Baseline code: 20.53sec
  • Pre-allocating data_class: 7.75sec (-62% total runtime)
  • De-nest histcounts2: 0.53sec (-97% total runtime)
  • Only assign non-zero values: 0.10sec (-99.5% total runtime)

De-nest histcounts2

sz = size(data);
szout = [sz, [numel(edge_X),numel(edge_Y)]-1];
data_class = NaN(szout);
for ix = 1:sz(1)
    for iy = 1:sz(2)

        g_X = data_X(ix, iy, :, :);
        g_Y = data_Y(ix, iy, :, :);
        [~,~,~,bx,by] = histcounts2(g_X, g_Y, edge_X, edge_Y);
        bx = squeeze(bx);
        by = squeeze(by);
        g_data = squeeze( data(ix, iy, :, :) );
            
        for ii = 1:sz(3)
            for ij = 1:sz(4)
                g_new = zeros(szout(end-1:end));
                g_new(bx(ii,ij),by(ii,ij)) = g_data(ii,ij);
                data_class(ix,iy,ii,ij,:,:) = g_new; 
            end
        end
    end
end

Further, you don't need to initialise the array of zeros g_new for every slice just to set one value. Instead, initialise data_class as a matrix of zeros and just override the specific values you're interested in:

Only assign non-zero values

sz = size(data);
szout = [sz, [numel(edge_X),numel(edge_Y)]-1];
data_class = zeros(szout);
for ix = 1:sz(1)
    for iy = 1:sz(2)

        g_X = data_X(ix, iy, :, :);
        g_Y = data_Y(ix, iy, :, :);
        [~,~,~,bx,by] = histcounts2(g_X, g_Y, edge_X, edge_Y);
        bx = squeeze(bx);
        by = squeeze(by);
        g_data = squeeze( data(ix, iy, :, :) );
            
        for ii = 1:sz(3)
            for ij = 1:sz(4)
                data_class(ix,iy,ii,ij,bx(ii,ij),by(ii,ij)) = g_data(ii,ij); 
            end
        end
    end
end

You could do something with sub2ind to get rid of the ii/ij loops entirely, but in a quick test that didn't show any performance over just using the loops.

Upvotes: 1

Related Questions