Bob Bryan
Bob Bryan

Reputation: 3837

How to use LINQ with a 2 dimensional array

I have a 2-dimensional byte array that looks something like this:

0 0 0 0 1

1 1 1 1 0

0 0 1 1 1

1 0 1 0 1

Each value in the array can only be 0 or 1. The above simplified example shows 4 rows with each row having 5 columns. I am trying to figure out how to use LINQ to return the index to the row that has the largest number of 1s set, which in the above example should return 1.

The following non LINQ C# code solves the problem:

static int GetMaxIndex(byte[,] TwoDArray)
{
   // This method finds the row with the greatest number of 1s set.
   //
   int NumRows = TwoDArray.GetLength(0);
   int NumCols = TwoDArray.GetLength(1);
   int RowCount, MaxRowCount = 0, MaxRowIndex = 0;
   //
   for (int LoopR = 0; LoopR < NumRows; LoopR++)
   {
      RowCount = 0;
      for (int LoopC = 0; LoopC < NumCols; LoopC++)
      {
         if (TwoDArray[LoopR, LoopC] != 0)
            RowCount++;
      }
      if (RowCount > MaxRowCount)
      {
         MaxRowCount = RowCount;
         MaxRowIndex = LoopR;
      }
   }
   return MaxRowIndex;
}

static void Main()
{
   byte[,] Array2D = new byte[4, 5] { { 0, 0, 0, 0, 1 }, { 1, 1, 1, 1, 0 }, { 0, 0, 1, 1, 1 }, { 1, 0, 1, 0, 1 } };
   int MaxInd = GetMaxIndex(Array2D);
   Console.WriteLine("MaxInd = {0}", MaxInd);
}

So, my questions are:

  1. How can LINQ be used to solve this, and would using LINQ here be less efficient that using the non LINQ code above?
  2. Is it possible to solve this problem with PLINQ? Or, would it be more efficient to use the Task Parallel Library (TPL) directly for the above code and split out the count of the number of 1s in each row to a separate thread, assuming that each row has at least 1,000 columns?

Upvotes: 3

Views: 20627

Answers (5)

Peter Kingston
Peter Kingston

Reputation: 48

Looking at the issue, this is really a two part answer for whatever is "more efficient" for your code. The loop presented is already very lean on resources, but could be more clear on the intent.

Based on the size of data being moved around, even at 10x that, PLINQ is going to be more resource intensive, just because of how much work it is to spin up a thread.

1.) Using LINQ can make this method more readable

Most 2d array LINQ queries I've come across convert it into a jagged array (or array of arrays) before searching. Here's a helper method that does that conversion for us, and to help make this guy look cleaner:

public static T[][] GetJagged<T>(this T[,] raw)
    {
        int lenX = raw.GetLength(0);
        int lenY = raw.GetLength(1);

        T[][] jagged = new T[lenX][];

        for (int x = 0; x < lenX; x++)
        {
            jagged[x] = new T[lenY];
            for (int y = 0; y < lenY; y++)
            {
                jagged[x][y] = raw[x, y];
            }
        }

        return jagged;
    }

Now, all we have left is to query the now 1d array for each member, and return the sum of each member. Here, I use the selector (b => b), essentially saying if there's a byte, select if for the Sum method.

static int GetMaxIndexLINQ(byte[,] TwoDArray)
    {
        byte[][] jagged = TwoDArray.GetJagged();

        IEnumerable<int> rowSums = from bitRows in jagged
                                   select bitRows.Sum((b) => b);

        int maxIndex = rowSums.Max();
        int MaxRowIndex = Array.IndexOf(rowSums.ToArray(), maxIndex);
        return MaxRowIndex;
    }

This way comes out very legible and even if the reader is new to coding, it's pretty easy to get the gist of what's happening here.

I'd like to point out that making your code more readable is making it more efficient. Teamwork makes the dream work, and the quicker a teammate can clearly make sense of what is happening in your code, the better for everyone.

2.) Optimizing for performance

As I said before, there isn't a lot happening here that can be made any leaner, any method calls or unnecessary checking will just slow this process down.

That being said, there is a small change to be made for some easy optimization. Because in this instance we are only dealing with 1s and 0s, there is a real benefit where we can use the internal optimizations the compiler makes, to our benefit. Rather than checking if a value is 0 or not, it is actually much faster to just add it in to our running sum!

static int GetMaxIndex_EvenBetter(byte[,] TwoDArray)
    {
        int NumRows = TwoDArray.GetLength(0);
        int NumCols = TwoDArray.GetLength(1);
        int RowCount, MaxRowCount = 0, MaxRowIndex = 0;

        for (int row = 0; row < NumRows; row++)
        {
            RowCount = 0;

            for (int col = 0; col < NumCols; col++)
            {
                RowCount += TwoDArray[row, col]; //See my change here
            }
            if (RowCount > MaxRowCount)
            {
                MaxRowCount = RowCount;
                MaxRowIndex = row;
            }
        }

        return MaxRowIndex;
    }

In most other cases you aren't working with just the 1s and 0s, so you DO want to check those values before adding, here however, unnecessary.

Upvotes: 1

Nick Strupat
Nick Strupat

Reputation: 5063

1) You can do it with LINQ this way...

private static int GetMaxIndex(byte[,] TwoDArray) {
    return Enumerable.Range(0, TwoDArray.GetLength(0))
                     .Select(
                         x => new {
                             Index = x,
                             Count = Enumerable.Range(0, TwoDArray.GetLength(1)).Count(y => TwoDArray[x, y] == 1)
                         })
                     .OrderByDescending(x => x.Count)
                     .First()
                     .Index;
}

... you'd have to test it to see if LINQ is faster or slower.

2) It is possible to use PLINQ. Just use ParallelEnumerable.Range for the row index generator

private static int GetMaxIndex2(byte[,] TwoDArray) {
    return ParallelEnumerable.Range(0, TwoDArray.GetLength(0))
                             .Select(
                                 x => new {
                                     Index = x,
                                     Count = Enumerable.Range(0, TwoDArray.GetLength(1)).Count(y => TwoDArray[x, y] == 1)
                                 })
                             .OrderByDescending(x => x.Count)
                             .First()
                             .Index;
}

Upvotes: 1

Los Frijoles
Los Frijoles

Reputation: 4821

Here is how I would do it. It's the same as others more or less, but without any Enumerable.Range (not that there is anything wrong with those (I use them all the time)...it just makes the code more indented in this case).

This one also includes PLINQ stuff. TPL (async/await) wouldn't be suitable for this because it is computationally bound and TPL is better suited to I/O bound operations. Your code would end up executing sequentially if you used async/await rather than PLINQ. This is because async/await won't go parallel until the thread is released (and it can start the next task...which could then go parallel) and purely synchronous functions (such as CPU stuff) won't every actually await...they'll just run all the way through. Basically, it would finish the first thing in your list before it even started the next thing, making it sequentially executed. PLINQ explicitly starts parallel tasks and doesn't have this issue.

//arry is your 2d byte array (byte[,] arry)
var maxIndex = arry
    .Cast<byte>() //cast the entire array into bytes
    .AsParallel() //make the transition to PLINQ (remove this to not use it)
    .Select((b, i) => new // create indexes
        {
            value = b,
            index = i
        })
    .GroupBy(g => g.index / arry.GetLength(1)) // group it by rows
    .Select((g, i) => new
        {
            sum = g.Select(g2 => (int)g2.value).Sum(), //sum each row
            index = i
        })
    .OrderByDescending(g => g.sum) //max by sum
    .Select(g => g.index) //grab the index
    .First(); //this should be the highest index

In terms of efficiency, you would probably get better results with your for loop. The question I would ask is, which is more readable and clear?

Upvotes: 1

Eric
Eric

Reputation: 5743

// This code is extracted from
// http://www.codeproject.com/Articles/170662/Using-LINQ-and-Extension-Methods-in-C-to-Sort-Vect
private static IEnumerable<T[]> ConvertToSingleDimension<T>(T[,] source)
{
    T[] arRow;
    for (int row = 0; row < source.GetLength(0); ++row)
    {
        arRow = new T[source.GetLength(1)];
        for (int col = 0; col < source.GetLength(1); ++col)
            arRow[col] = source[row, col];
        yield return arRow;
    }
}


// Convert byte[,] to anonymous type {int index, IEnumerable<byte[]>} for linq operation
var result = (from item in ConvertToSingleDimension(Array2D).Select((i, index) => new {Values = i, Index = index})
             orderby item.Values.Sum(i => i) descending, item.Index
             select item.Index).FirstOrDefault();

Upvotes: 0

slvnperron
slvnperron

Reputation: 1343

It's hard to work with multidimentional arrays with LINQ, but here's how you could do:

var arr = new [,] { { 0, 0, 0, 0, 1 }, { 1, 1, 1, 1, 0 }, { 0, 0, 1, 1, 1 }, { 1, 0, 1, 0, 1 } };

var data =
    Enumerable.Range(0, 4)
        .Select(
            row =>
                new
                {
                    index = row,
                    count = Enumerable.Range(0, 5).Select(col => arr[row, col]).Count(x => x == 1)
                })
        .OrderByDescending(x => x.count)
        .Select(x => x.index)
        .First();

Upvotes: 6

Related Questions