cris.gomez
cris.gomez

Reputation: 3

Compare two collections optimization

I need to optimize the code in comparing and manipulation of its values.I have a two collection of data, let say collection A and collection B. In collection A is the list of all existing data, and in collection B is the consolidated of existing data and the new data to be added. Now, I need to check if collection B exists in collection A, if it exists I need to set or modify the specific data of collection A with the updated data of collection B. If the specific data of Collection B is not exists in A then I need to add it to collection A.

Here is the code that I have.

  public CollectionA Save(List<CollectionB> collectionb)
  var collectionA = collectionA.GetAll().OrderByDescending(x => x.SortOrder);
 foreach (var b in collectionb)
            {
                if (b.Id == 0)
                {
                    a.Add(b);
                    continue;
                }
                foreach (var a in collectionA.Where(aa => aa.Id == b.Id))
                {
                    a.ZoneCode = b.ZoneCode;
                    a.SortOrder = b.SortOrder;
                    a.Point1 = b.Point1;
                    a.Point2 = b.Point2;
                    a.Point3 = b.Point3;
                    a.RangeSpread = b.RangeSpread;
                    a.MidpointDifferential = b.MidpointDifferential;
                    a.AnnualIncentivePercent = b.AnnualIncentivePercent;
                    a.AnnualIncentiveAmount = b.AnnualIncentiveAmount;
                    a.GradeMisc1 = b.GradeMisc1;
                    a.GradeMisc2 = b.GradeMisc2;
                    a.ZoneComment = b.ZoneComment;
                }
            }

Upvotes: 0

Views: 157

Answers (2)

Robert Giesecke
Robert Giesecke

Reputation: 4314

So you have 2 large-ish lists and you want to find the changes between without actually comparing everyone against everyone.

You should create 2 dictionaries and index them by their key values. C# anonymous types are a good fit for this, because the compiler already generates GetHashCode and Equals based on the values.

This is a rather naive approach (as it could easily throw OutOfMemoryException because it does everything in-memory), but it should perform much better than just comparing everything from A with everything from B. This will only compare the values where the keys are the same:

class ListComparisonResult<TKey>
{
  public IList<TKey> NewKeys { get; private set; }
  public IList<TKey> OldKeys { get; private set; }
  public IList<TKey> ChangedKeys { get; private set; }

  public ListComparisonResult(IList<TKey> newKeys, IList<TKey> oldKeys, IList<TKey> changedKeys)
  {
    NewKeys = new ReadOnlyCollection<TKey>(newKeys);
    OldKeys = new ReadOnlyCollection<TKey>(oldKeys);
    ChangedKeys = new ReadOnlyCollection<TKey>(changedKeys);
  }
}

ListComparisonResult<TKey> GetChanges<TRow, TKey, TValues>(
  IEnumerable<TRow> collectionA, 
  IEnumerable<TRow> collectionB,  
  Func<TRow, TKey> keySelector, 
  Func<TRow, TValues> comparableValuesSelector)
{
  var byId = new
  {
    A = collectionA.ToDictionary(keySelector),
    B = collectionB.ToDictionary(keySelector),
  };

  var sameIds = new HashSet<TKey>(byId.A.Keys.Where(byId.B.ContainsKey));

  var changedIds = (from id in sameIds
                    let a = byId.A[id]
                    let b = byId.B[id]
                    where !comparableValuesSelector(a).Equals(comparableValuesSelector(b))
                    select id).ToList();

  var oldIds = byId.A.Keys.Where(id => !byId.B.ContainsKey(id)).ToList();
  var newIds = byId.B.Keys.Where(id => !byId.A.ContainsKey(id)).ToList();
  return new ListComparisonResult<TKey>(newIds, oldIds, changedIds);
}

This is how it could be used:

var r = new Random();
var collectionA = (from id in Enumerable.Range(0, 1000000)
                   select new
                   {
                     ID = id, 
                     Value1 = r.Next(1, 3),
                     Value2 = r.Next(0, 1) == 1,
                   }).ToList();

var collectionB = (from id in Enumerable.Range(58945, 1000000)
                   select new
                   {
                     ID = id, 
                     Value1 = r.Next(1, 3),
                     Value2 = r.Next(0, 1) == 1,
                   }).ToList();

var timer = Stopwatch.StartNew();
var changes = GetChanges(collectionA, collectionB, t => t.ID, t => new{t.Value1, t.Value2});
timer.Stop();
Console.WriteLine(new
{
  changedIds = changes.ChangedKeys.Count, 
  newIds = changes.NewKeys.Count, 
  oldIds = changes.OldKeys.Count, 
  timer.ElapsedMilliseconds
});

Upvotes: 1

Marton
Marton

Reputation: 831

Use the .FirstOrDefault method to look for an item with the same ID. If the item exists, remove it from the main collection, then add the new ( / updated) item. If there's no matching ID, just add the new item.

foreach (var b in collectionB)
{
   var itemToUpdate = collectionA.FirstOrDefault(a => a.Id == b.Id);
   if (itemToUpdate != null)
   {
      collectionA.Remove(itemToUpdate);
   }
   collectionA.Add(b);
}

edit:

You can use the .SingleOrDefault method if your list is expected to contain an items with unique ID's. The difference is, .SingleOrDefault will throw an exception if there are multiple items with the same ID.

Upvotes: 0

Related Questions