Gilad
Gilad

Reputation: 6595

Matlab code is a lot faster than C# code not expected

int n = varRatio.Count * varRatio[0].Count;
        double[] y_0 = new double[var_ratio_ne.Count];
        double[] y_n = new double[y_0.Length];
        double[] var_map = new double[y_0.Length];
        double[] var_fa_map = new double[y_0.Length];
        for (int j = 0; j < var_width.Count; j++)
        {
            List<double> tempRow = new List<double>();
            for (int index = 0; index < var_ratio_ne.Count; index++)
            {
                y_0[index] = ( (var_ratio_ne[index] - var_thr[0]) / var_width[j]);
            }
            double inc = delta / var_width[j];
            for (int i = 0; i < var_thr.Count; i++)
            {
                if (var_thr[i] >= curr_max)
                {
                    break;
                }
                Parallel.For(0, y_0.Length, k =>
                {
                    y_n[k] = y_0[k] - i*inc;
                    var_map[k] = Math.Min(Math.Max(y_n[k], 0), 1);
                    var_fa_map[k] = (not_edge_map[k]*var_map[k]);
                });
                tempRow.Add(var_fa_map.Sum() / n);
            }
            var_measure.Add(tempRow);
        }

Here is the matlab code I'm converting:

curr_max = max(var_ratio(:));
N = numel(var_ratio);
for j = 1:numel(var_width)
    y_0 = (var_ratio_ne - var_thr(1))/var_width(j);
    inc = delta/var_width(j);
    z = not_edge_map;
    for i = 1:numel(var_thr)
        if var_thr(i)>=curr_max
            break;
        end
        y_n = y_0 - (i-1)*inc;
        var_map = min(max(y_n,0),1);
        var_fa_map = z.*var_map;
        var_measure(i,j) = sum(var_fa_map(:))/N;
        % optimization for matlab: pixels that didn't contribute to the false alarm in this
        % iteration will not contribute in the next one as well becouse the treshold increses so we can throw them out 
        ii = y_n>0;
        y_0 = y_0(ii);
        z = z(ii);
    end
end

The sizes of the arrays are:

UPDATE: my code runs a lot faster after this change

    //N = numel(var_ratio);
    int n = varRatio.Count * varRatio[0].Count;
    double[] y_0 = new double[var_ratio_ne.Count];
    for (int j = 0; j < var_width.Count; j++)
    {
        for (int index = 0; index < var_ratio_ne.Count; index++)
        {
            y_0[index] = ( (var_ratio_ne[index] - var_thr[0]) / var_width[j]);
        }
        double inc = delta / var_width[j];
        int indexOF = var_thr.FindIndex(x => x >= curr_max);
        double[] tempRow = new double[indexOF];
        Parallel.For(0, indexOF ,i =>
        {
            var total = 0d;
            for (int k = 0; k < y_0.Length; k++)
            {
                total += (not_edge_map[k] * Math.Min(Math.Max(y_0[k] - i * inc, 0), 1));
            }
            tempRow[i] = total/n;
        });

        List<double> tempRowList = new List<double>();
        //copy the results of Parallel compute
        for (int i = 0; i < indexOF; i++)
        {
            tempRowList.Add(tempRow[i]);
        }
        //fill the rest with zeros
        for (int i = indexOF; i < var_thr.Count; i++)
        {
            tempRowList.Add(0);
        }
        var_measure.Add(tempRowList);
    }

I think I'm over calculating something here. Although I'm running in Debug mode, the performance of C#(in minutes) is terrible compared to matlab(~20 seconds).
Can you please help me with the runtime optimization? I find it hard to understand why matlab performs better than the C# code.

Upvotes: 0

Views: 476

Answers (1)

Stuart
Stuart

Reputation: 5496

I'd suggest reducing large allocations, so this:

List<double> y_n = new List<double>();
List<double> var_map = new List<double>();
List<double> var_fa_map = new List<double>();
for (int k = 0; k < y_0.Count; k++)
{
    y_n.Add(y_0[k] - i * inc);
    var_map.Add(Math.Min(Math.Max(y_n[k], 0), 1));
    var_fa_map.Add(not_edge_map[k] * var_map[k]);
}
tempRow.Add(var_fa_map.Sum() / n);

becomes:

var total = 0d;
for (int k = 0; k < y_0.Count; k++)
{
    total += (not_edge_map[k] * Math.Min(Math.Max(y_0[k] - i * inc, 0), 1));
}       
tempRow.Add(total / n);

In my tests this halves the time, but your milage may vary. There are other optimizations to be made for sure like reducing allocations and combining some of the computational tasks, but I'd need better representative inputs to be able to profile it effectively, for example I'm not sure if making this parallel and switching to concurrent collections will have a positive effect.

Upvotes: 1

Related Questions