Nodoid
Nodoid

Reputation: 1561

Replacing a foreach with LINQ

I have some very simple code that I'm trying to get running marginally quicker (there are a lot of these small types of call dotted around the code which seems to be slowing things down) using LINQ instead of standard code.

The problem is this - I have a variable outside of the LINQ which the result of the LINQ query needs to add it.

The original code looks like this

double total = 0
foreach(Crop c in p.Crops)
{
    if (c.CropType.Type == t.Type)
       total += c.Area;
}
return total;

This method isn't slow until the loop starts getting large, then it slows on the phone. Can this sort of code be moved to a relatively quick and simple piece of LINQ?

Upvotes: 1

Views: 184

Answers (3)

Squid
Squid

Reputation: 4810

Looks like you could use sum: (edit: my syntax was wrong)

total = (from c in p.Crops
            where c.CropType.Type == t.Type
            select c.Area).Sum();

Or in extension method format:

total = p.Crops.Where(c => c.CropType.Type == t.Type).Sum(c => c.area);

As to people saying LINQ won't perform better where is your evidence? (The below is based on post from Hanselman? I ran the following in linqpad: (You will need to download and reference nbuilder to get it to run)

void Main()
{
    //Nbuilder is used to create a chunk of sample data
    //http://nbuilder.org
    var crops = Builder<Crop>.CreateListOfSize(1000000).Build();
    var t = new Crop();
    t.Type = Type.grain;

    double total = 0;

    var sw = new Stopwatch();
    sw.Start();

    foreach(Crop c in crops)
    {
        if (c.Type == t.Type)
            total += c.area;
    }
    sw.Stop();
    total.Dump("For Loop total:");
    sw.ElapsedMilliseconds.Dump("For Loop Elapsed Time:");


    sw.Restart();
    var result = crops.Where(c => c.Type == t.Type).Sum(c => c.area);
    sw.Stop();

    result.Dump("LINQ total:");
    sw.ElapsedMilliseconds.Dump("LINQ Elapsed Time:");


    sw.Restart();
    var result2 = (from c in crops
            where c.Type == t.Type
            select c.area).Sum();

    result.Dump("LINQ (sugar syntax) total:");
    sw.ElapsedMilliseconds.Dump("LINQ (sugar syntax) Elapsed Time:");
}


public enum Type
{
    wheat,
    grain,
    corn,
    maize,
    cotton
}

public class Crop
{
    public string Name { get; set; }
    public Type Type { get; set; }
    public double area;
}

The results come out favorably to LINQ:

For Loop total: 99999900000

For Loop Elapsed Time: 25

LINQ total: 99999900000

LINQ Elapsed Time: 17

LINQ (sugar syntax) total: 99999900000

LINQ (sugar syntax) Elapsed Time: 17

Upvotes: 3

Brian Rasmussen
Brian Rasmussen

Reputation: 116401

This is a pretty straight forward sum, so I doubt you will see any benefit from using LINQ.

You haven't told us much about the setup here, but here's an idea. If p.Crops is large and only a small number of the items in the sequence are of the desired type, you could build another sequence that contains just the items you need.

I assume that you know the type when you insert into p.Crops. If that's the case you could easily insert the relevant items in another collection and use that instead for the sum loop. That will reduce N and get rid of the comparison. It will still be O(N) though.

Upvotes: 1

Dave Cousineau
Dave Cousineau

Reputation: 13148

The main way to optimize this would be changing p, which may or may not be possible.

Assuming p is a P, and looks something like this:

internal sealed class P
{
   private readonly List<Crop> mCrops = new List<Crop>();

   public IEnumerable<Crop> Crops { get { return mCrops; } }

   public void Add(Crop pCrop)
   {
      mCrops.Add(pCrop);
   }
}

(If p is a .NET type like a List<Crop>, then you can create a class like this.)

You can optimize your loop by maintaining a dictionary:

internal sealed class P
{
   private readonly List<Crop> mCrops = new List<Crop>();

   private readonly Dictionary<Type, List<Crop>> mCropsByType
      = new Dictionary<Type, List<Crop>>();

   public IEnumerable<Crop> Crops { get { return mCrops; } }

   public void Add(Crop pCrop)
   {
      if (!mCropsByType.ContainsKey(pCrop.CropType.Type))
         mCropsByType.Add(pCrop.CropType.Type, new List<Crop>());

      mCropsByType[pCrop.CropType.Type].Add(pCrop);
      mCrops.Add(pCrop);
   }

   public IEnumerable<Crop> GetCropsByType(Type pType)
   {
      return mCropsByType.ContainsKey(pType)
         ? mCropsByType[pType]
         : Enumerable.Empty<Crop>();
   }
}

Your code then becomes something like:

double total = 0
foreach(Crop crop in p.GetCropsByType(t.Type))
   total += crop.Area;

return total;

Another possibility that would be even faster is:

internal sealed class P
{
   private readonly List<Crop> mCrops = new List<Crop>();

   private double mTotalArea;

   public IEnumerable<Crop> Crops { get { return mCrops; } }

   public double TotalArea { get { return mTotalArea; } }

   public void Add(Crop pCrop)
   {   
      mCrops.Add(pCrop);
      mTotalArea += pCrop.Area;
   }
}

Your code would then simply access the TotalArea property and you wouldn't even need a loop:

return p.TotalArea;

You might also consider extracting the code that manages the Crops data to a separate class, depending on what P is.

Upvotes: 1

Related Questions