Reputation: 662
I'm running a method to transactionalize data stored in a ConcurrentQueue<T>
. In CPU performance profiling, the major hit appears to be:
foreach (Item inSequence in items.Where(w => w.SequenceNumber == i.SequenceNumber && w.Device == i.Device)) {}
With 1,000 and 10,000 it is actually quite quick. At 100,000 items the performance becomes critical - that specific Linq
query goes from taking about 4.5% of the total runtime CPU to over 58% of the total runtime CPU. I'm assuming the performance hit is specifically due to the size of the ConcurrentQueue
, but I'm not sure what to do about it. If avoiding a Linq query resolved the issue, that would be fine. I'm just stuck with what to do. Is there some other concurrent type that would be more performant?
It's a CQ because the data is built and read asynchronously. However, during this particular method, which happens after the data is built and prior to it being read back out, it's running on a single thread.
Very loose sample is here: https://dotnetfiddle.net/hjDOva
using System;
using System.Diagnostics;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
public class Program
{
static int count = 100000;
public static void Main()
{
var items = new ConcurrentQueue<Item>();
var r = new Random();
for (int i = 0; i < count; i++)
{
items.Enqueue(new Item());
}
var sw = Stopwatch.StartNew();
foreach (Item i in items.DistinctBy(d => new { d.SequenceNumber, d.Device }))
foreach (Item inSequence in items.Where(w => w.Device == i.Device && w.SequenceNumber == i.SequenceNumber))
{
}
Console.WriteLine(sw.Elapsed);
}
}
public static class Extensions
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
}
public class Item
{
#region Fields
protected bool fixDates;
protected string randomSerial;
protected decimal amount;
protected string device;
protected DateTime depositTime;
public int SequenceNumber = -1;
[NonSerialized()]
protected System.Random rnd = new Random(Int32.Parse(Guid.NewGuid().ToString().Substring(0, 8), System.Globalization.NumberStyles.HexNumber));
#endregion
#region Properties
public bool FixDates
{
get
{
return this.fixDates;
}
set
{
this.fixDates = value;
}
}
public string Amount
{
get
{
return this.amount.ToString();
}
set
{
this.amount = Convert.ToDecimal(value);
}
}
public string RandomSerial
{
get { return randomSerial; }
set { randomSerial = value; }
}
public string Device
{
get { return this.device; }
set { this.device = value; }
}
public DateTime DepositTime
{
get { return this.depositTime; }
set { this.depositTime = value; }
}
#endregion
#region Constructors
public Item()
{
fixDates = false;
RandomSerial = Guid.NewGuid().ToString().Substring(0, 8);
this.amount = 5.00m;
this.device = "IC" + rnd.Next(6).ToString();
this.depositTime = DateTime.Now;
this.SequenceNumber = rnd.Next(10);
}
#endregion
}
However it doesn't offer the memory required for 100,000 items.
As to questions about using CQ, yes, I understand that queues are not ideal for this. The tool generates data to test imports for various product types. There's only a single product that requires the method where this falls, Transactionalize()
. Most of the time this code is not used.
It's a CQ because the system creates the objects in parallel (this was a significant performance improvement when it happened) and in most cases they are dequeued in parallel as well.
Upvotes: 0
Views: 1841
Reputation: 43996
Assuming that the intention of the code below is to process the items in groups, with each group having the same SequenceNumber
and Device
,
foreach (Item i in items.DistinctBy(d => new { d.SequenceNumber, d.Device }))
foreach (Item inSequence in items
.Where(w => w.Device == i.Device && w.SequenceNumber == i.SequenceNumber))
{
}
...you can achieve the same thing much more efficiently by using the Linq method GroupBy
like this:
var groups = items.GroupBy(i => (i.SequenceNumber, i.Device));
foreach (IGrouping<(string, string), Item> group in groups)
foreach (Item inSequence in group)
{
}
Notice that instead of anonymous types I used the more lightweight ValueTuple
s as keys, that do not require garbage collection.
If you also want to be able to search for a specific group later, again very efficiently, instead of the GroupBy
use the similar ToLookup
.
Upvotes: 2