Matlab code is a lot faster than C# code not expected

Question

int n = varRatio.Count * varRatio[0].Count;
        double[] y_0 = new double[var_ratio_ne.Count];
        double[] y_n = new double[y_0.Length];
        double[] var_map = new double[y_0.Length];
        double[] var_fa_map = new double[y_0.Length];
        for (int j = 0; j < var_width.Count; j++)
        {
            List tempRow = new List();
            for (int index = 0; index < var_ratio_ne.Count; index++)
            {
                y_0[index] = ( (var_ratio_ne[index] - var_thr[0]) / var_width[j]);
            }
            double inc = delta / var_width[j];
            for (int i = 0; i < var_thr.Count; i++)
            {
                if (var_thr[i] >= curr_max)
                {
                    break;
                }
                Parallel.For(0, y_0.Length, k =>
                {
                    y_n[k] = y_0[k] - i*inc;
                    var_map[k] = Math.Min(Math.Max(y_n[k], 0), 1);
                    var_fa_map[k] = (not_edge_map[k]*var_map[k]);
                });
                tempRow.Add(var_fa_map.Sum() / n);
            }
            var_measure.Add(tempRow);
        }

Here is the matlab code I'm converting:

curr_max = max(var_ratio(:));
N = numel(var_ratio);
for j = 1:numel(var_width)
    y_0 = (var_ratio_ne - var_thr(1))/var_width(j);
    inc = delta/var_width(j);
    z = not_edge_map;
    for i = 1:numel(var_thr)
        if var_thr(i)>=curr_max
            break;
        end
        y_n = y_0 - (i-1)*inc;
        var_map = min(max(y_n,0),1);
        var_fa_map = z.*var_map;
        var_measure(i,j) = sum(var_fa_map(:))/N;
        % optimization for matlab: pixels that didn't contribute to the false alarm in this
        % iteration will not contribute in the next one as well becouse the treshold increses so we can throw them out 
        ii = y_n>0;
        y_0 = y_0(ii);
        z = z(ii);
    end
end

The sizes of the arrays are:

N = 673326
var_ratio_ne = 586417
var_thr = 131072
var_width = [15 30 45]
not_edge_map = 586417
var_ratio = 666x1011 double matrix

UPDATE: my code runs a lot faster after this change

    //N = numel(var_ratio);
    int n = varRatio.Count * varRatio[0].Count;
    double[] y_0 = new double[var_ratio_ne.Count];
    for (int j = 0; j < var_width.Count; j++)
    {
        for (int index = 0; index < var_ratio_ne.Count; index++)
        {
            y_0[index] = ( (var_ratio_ne[index] - var_thr[0]) / var_width[j]);
        }
        double inc = delta / var_width[j];
        int indexOF = var_thr.FindIndex(x => x >= curr_max);
        double[] tempRow = new double[indexOF];
        Parallel.For(0, indexOF ,i =>
        {
            var total = 0d;
            for (int k = 0; k < y_0.Length; k++)
            {
                total += (not_edge_map[k] * Math.Min(Math.Max(y_0[k] - i * inc, 0), 1));
            }
            tempRow[i] = total/n;
        });

        List tempRowList = new List();
        //copy the results of Parallel compute
        for (int i = 0; i < indexOF; i++)
        {
            tempRowList.Add(tempRow[i]);
        }
        //fill the rest with zeros
        for (int i = indexOF; i < var_thr.Count; i++)
        {
            tempRowList.Add(0);
        }
        var_measure.Add(tempRowList);
    }

I think I'm over calculating something here. Although I'm running in Debug mode, the performance of C#(in minutes) is terrible compared to matlab(~20 seconds).
Can you please help me with the runtime optimization? I find it hard to understand why matlab performs better than the C# code.

Stuart · Accepted Answer

I'd suggest reducing large allocations, so this:

List y_n = new List();
List var_map = new List();
List var_fa_map = new List();
for (int k = 0; k < y_0.Count; k++)
{
    y_n.Add(y_0[k] - i * inc);
    var_map.Add(Math.Min(Math.Max(y_n[k], 0), 1));
    var_fa_map.Add(not_edge_map[k] * var_map[k]);
}
tempRow.Add(var_fa_map.Sum() / n);

becomes:

var total = 0d;
for (int k = 0; k < y_0.Count; k++)
{
    total += (not_edge_map[k] * Math.Min(Math.Max(y_0[k] - i * inc, 0), 1));
}       
tempRow.Add(total / n);

In my tests this halves the time, but your milage may vary. There are other optimizations to be made for sure like reducing allocations and combining some of the computational tasks, but I'd need better representative inputs to be able to profile it effectively, for example I'm not sure if making this parallel and switching to concurrent collections will have a positive effect.

Matlab code is a lot faster than C# code not expected

Answers (1)

Related Questions