Reputation: 13011
Imagine the following type:
public struct Account
{
public int Id;
public double Amount;
}
What is the best algorithm to synchronize two IList<Account>
in C# 2.0 ? (No linq) ?
The first list (L1) is the reference list, the second (L2) is the one to synchronize according to the first:
The Id identifies accounts. It's no too hard to find a naive and working algorithm, but I would like to know if there is a smart solution to handle this scenario without ruining readability and perfs.
EDIT :
Upvotes: 11
Views: 5736
Reputation: 1491
I have implemented two algorithms, one for sorted and one for sequential collections. Both support null values and duplicates, and work in the same way:
They yield return CollectionModification<LeftItemType,RightItemType>
s which is similar to CollectionChangedEventArgs<T>
(reference) which can be used in return to synchronize a collection.
Regarding yield return:
When using one or the other algorithm where your left items (the reference collection) is compared to right items, you can apply each yield returned
CollectionModification
as soon as they are yield returned , but this can result in a "collection was modified"-exception (for example when usingList<T>.GetEnumerator
). To prevent this, both algorithms have the ability implemented to use an indexable collection as reference collection that is about to be mutated. You only have to wrap the reference collection withYieldIteratorInfluencedReadOnlyList<ItemType>
(abstract) by using the extensions methods in YieldIteratorInfluencedReadOnlyListExtensions. :)
The first algorithm works for ascended or descended ordered lists and uses IComparer<T>
.
/// <summary>
/// The algorithm creates modifications that can transform one collection into another collection.
/// The collection modifications may be used to transform <paramref name="leftItems"/>.
/// Assumes <paramref name="leftItems"/> and <paramref name="rightItems"/> to be sorted by that order you specify by <paramref name="collectionOrder"/>.
/// Duplications are allowed but take into account that duplications are yielded as they are appearing.
/// </summary>
/// <typeparam name="LeftItemType">The type of left items.</typeparam>
/// <typeparam name="RightItemType">The type of right items.</typeparam>
/// <typeparam name="ComparablePartType">The type of the comparable part of left item and right item.</typeparam>
/// <param name="leftItems">The collection you want to have transformed.</param>
/// <param name="getComparablePartOfLeftItem">The part of left item that is comparable with part of right item.</param>
/// <param name="rightItems">The collection in which <paramref name="leftItems"/> could be transformed.</param>
/// <param name="getComparablePartOfRightItem">The part of right item that is comparable with part of left item.</param>
/// <param name="collectionOrder">the presumed order of items to be used to determine <see cref="IComparer{T}.Compare(T, T)"/> argument assignment.</param>
/// <param name="comparer">The comparer to be used to compare comparable parts of left and right item.</param>
/// <param name="yieldCapabilities">The yieldCapabilities that regulates how <paramref name="leftItems"/> and <paramref name="rightItems"/> are synchronized.</param>
/// <returns>The collection modifications.</returns>
/// <exception cref="ArgumentNullException">Thrown when non-nullable arguments are null.</exception>
public static IEnumerable<CollectionModification<LeftItemType, RightItemType>> YieldCollectionModifications<LeftItemType, RightItemType, ComparablePartType>(
IEnumerable<LeftItemType> leftItems,
Func<LeftItemType, ComparablePartType> getComparablePartOfLeftItem,
IEnumerable<RightItemType> rightItems,
Func<RightItemType, ComparablePartType> getComparablePartOfRightItem,
SortedCollectionOrder collectionOrder,
IComparer<ComparablePartType> comparer,
CollectionModificationsYieldCapabilities yieldCapabilities)
Python algorithm inspiration taken from: Efficient synchronization of two instances of an ordered list.
The second algorithm works for any order and uses IEqualityComparer<T>
.
/// <summary>
/// The algorithm creates modifications that can transform one collection into another collection.
/// The collection modifications may be used to transform <paramref name="leftItems"/>.
/// The more the collection is synchronized in an orderly way, the more efficient the algorithm is.
/// Duplications are allowed but take into account that duplications are yielded as they are appearing.
/// </summary>
/// <typeparam name="LeftItemType">The type of left items.</typeparam>
/// <typeparam name="RightItemType">The type of right items.</typeparam>
/// <typeparam name="ComparablePartType">The type of the comparable part of left item and right item.</typeparam>
/// <param name="leftItems">The collection you want to have transformed.</param>
/// <param name="getComparablePartOfLeftItem">The part of left item that is comparable with part of right item.</param>
/// <param name="rightItems">The collection in which <paramref name="leftItems"/> could be transformed.</param>
/// <param name="getComparablePartOfRightItem">The part of right item that is comparable with part of left item.</param>
/// <param name="equalityComparer">The equality comparer to be used to compare comparable parts.</param>
/// <param name="yieldCapabilities">The yield capabilities, e.g. only insert or only remove.</param>
/// <returns>The collection modifications.</returns>
/// <exception cref="ArgumentNullException">Thrown when non-nullable arguments are null.</exception>
public static IEnumerable<CollectionModification<LeftItemType, RightItemType>> YieldCollectionModifications<LeftItemType, RightItemType, ComparablePartType>(
IEnumerable<LeftItemType> leftItems,
Func<LeftItemType, ComparablePartType> getComparablePartOfLeftItem,
IEnumerable<RightItemType> rightItems,
Func<RightItemType, ComparablePartType> getComparablePartOfRightItem,
IEqualityComparer<ComparablePartType>? equalityComparer,
CollectionModificationsYieldCapabilities yieldCapabilities)
where ComparablePartType : notnull
One of the following frameworks is required
Both algorithms are created with custom implemented types (IndexDirectory
, NullableKeyDictionary
, LinkedBucketList
to name a few), so I can not simply copy paste the code here, so I would like to reference you to my following packages:
Account:
public class Account
{
public Account(int id) =>
Id = id;
public int Id { get; }
public double Amount { get; }
}
And the following collection item equality comparer class:
AccountEqualityComparer:
public class AccountEqualityComparer : EqualityComparer<Account>
{
public new static AccountEqualityComparer Default = new AccountEqualityComparer();
public override bool Equals([AllowNull] Account x, [AllowNull] Account y) =>
ReferenceEquals(x, y) || (!(x is null && y is null) && x.Id.Equals(y.Id));
public override int GetHashCode([DisallowNull] Account obj) =>
obj.Id;
}
AccountCollectionViewModel:
using Teronis.Collections.Algorithms.Modifications;
using Teronis.Collections.Synchronization;
using Teronis.Collections.Synchronization.Extensions;
using Teronis.Reflection;
public class AccountCollectionViewModel : SyncingCollectionViewModel<Account, Account>
{
public AccountCollectionViewModel()
: base(CollectionSynchronizationMethod.Sequential(AccountEqualityComparer.Default))
{
// In case of SyncingCollectionViewModel, we have to pass a synchronization method.
//
// Sequential means any order
//
}
protected override Account CreateSubItem(Account superItem) =>
superItem;
protected override void ApplyCollectionItemReplace(in ApplyingCollectionModificationBundle modificationBundle)
{
foreach (var (oldItem, newItem) in modificationBundle.OldSuperItemsNewSuperItemsModification.YieldTuplesForOldItemNewItemReplace())
{
// Implementation detail: update left public property values by right public property values.
TeronisReflectionUtils.UpdateEntityVariables(oldItem, newItem);
}
}
}
Program:
using System.Diagnostics;
using System.Linq;
class Program
{
static void Main()
{
// Arrange
var collection = new AccountCollectionViewModel();
var initialData = new Account[] {
new Account(5) { Amount = 0 },
new Account(7) { Amount = 0 },
new Account(3) { Amount = 0 }
};
var newData = new Account[] {
new Account(5) { Amount = 10 },
/* Account by ID 7 got removed .. */
/* but account by ID 8 is new. */
new Account(8) { Amount = 10 },
new Account(3) { Amount = 10 }
};
// Act
collection.SynchronizeCollection(initialData);
// Assert
Debug.Assert(collection.SubItems.ElementAt(1).Id == 7, "The account at index 1 has not the ID 7.");
Debug.Assert(collection.SubItems.All(x => x.Amount == 0), "Not all accounts have an amount of 0.");
// Act
collection.SynchronizeCollection(newData);
// Assert
Debug.Assert(collection.SubItems.ElementAt(1).Id == 8, "The account at index 1 has not the ID 8.");
Debug.Assert(collection.SubItems.All(x => x.Amount == 10), "Not all accounts have an amount of 10.");
;
}
}
You can see that I use SyncingCollectionViewModel, a very "heavy" type. That's because I have not finished the lightweight SynchronizableCollection implementation yet (virtual methods for add, remove, replace and so on are missing).
Upvotes: 0
Reputation: 735
I had the same problem, and my best solution was the following (adapted to your case), having both lists loaded:
I recommend implementing the IEquatable<>
interface in your Account class (or just override the Equals()
method) so it always compares IDs on methods that require comparison between objects:
public struct Account : IEquatable<Account>
{
public int Id;
public double Amount;
public bool Equals(Account other)
{
if (other == null) return false;
return (this.Id.Equals(other.Id));
}
}
The sync algorithm would be something like this (make sure both lists are initialized so no errors occurr):
L1.ForEach (L1Account =>
{
var L2Account = L2.Find(a => a.Id == L1Account.id);
// If found, update values
if (L2Account != null)
{
L1Account.Amount = L2Account.Amount;
L2.Remove(L2Account);
}
// If not found, remove it
else
{
L1.Remove(L1Account);
}
}
// Add any remaining L2 Account to L1
L1.AddRange(L2);
Upvotes: 1
Reputation: 8214
I know this is an old post but you should check out AutoMapper. It will do exactly what you want in a very flexible and configurable way.
Upvotes: 0
Reputation: 91895
If your two lists are sorted, then you can simply walk through them in tandem. This is an O(m+n) operation. The following code could help:
class Program
{
static void Main()
{
List<string> left = new List<string> { "Alice", "Charles", "Derek" };
List<string> right = new List<string> { "Bob", "Charles", "Ernie" };
EnumerableExtensions.CompareSortedCollections(left, right, StringComparer.CurrentCultureIgnoreCase,
s => Console.WriteLine("Left: " + s), s => Console.WriteLine("Right: " + s), (x,y) => Console.WriteLine("Both: " + x + y));
}
}
static class EnumerableExtensions
{
public static void CompareSortedCollections<T>(IEnumerable<T> source, IEnumerable<T> destination, IComparer<T> comparer, Action<T> onLeftOnly, Action<T> onRightOnly, Action<T, T> onBoth)
{
EnumerableIterator<T> sourceIterator = new EnumerableIterator<T>(source);
EnumerableIterator<T> destinationIterator = new EnumerableIterator<T>(destination);
while (sourceIterator.HasCurrent && destinationIterator.HasCurrent)
{
// While LHS < RHS, the items in LHS aren't in RHS
while (sourceIterator.HasCurrent && (comparer.Compare(sourceIterator.Current, destinationIterator.Current) < 0))
{
onLeftOnly(sourceIterator.Current);
sourceIterator.MoveNext();
}
// While RHS < LHS, the items in RHS aren't in LHS
while (sourceIterator.HasCurrent && destinationIterator.HasCurrent && (comparer.Compare(sourceIterator.Current, destinationIterator.Current) > 0))
{
onRightOnly(destinationIterator.Current);
destinationIterator.MoveNext();
}
// While LHS==RHS, the items are in both
while (sourceIterator.HasCurrent && destinationIterator.HasCurrent && (comparer.Compare(sourceIterator.Current, destinationIterator.Current) == 0))
{
onBoth(sourceIterator.Current, destinationIterator.Current);
sourceIterator.MoveNext();
destinationIterator.MoveNext();
}
}
// Mop up.
while (sourceIterator.HasCurrent)
{
onLeftOnly(sourceIterator.Current);
sourceIterator.MoveNext();
}
while (destinationIterator.HasCurrent)
{
onRightOnly(destinationIterator.Current);
destinationIterator.MoveNext();
}
}
}
internal class EnumerableIterator<T>
{
private readonly IEnumerator<T> _enumerator;
public EnumerableIterator(IEnumerable<T> enumerable)
{
_enumerator = enumerable.GetEnumerator();
MoveNext();
}
public bool HasCurrent { get; private set; }
public T Current
{
get { return _enumerator.Current; }
}
public void MoveNext()
{
HasCurrent = _enumerator.MoveNext();
}
}
You'll have to be careful about modifying the collections while iterating over them, though.
If they're not sorted, then comparing every element in one with every element in the other is O(mn), which gets painful really quickly.
If you can bear to copy the key values from each collection into a Dictionary or similar (i.e. a collection with acceptable performance when asked "is X present?"), then you could come up with something reasonable.
Upvotes: 2
Reputation: 19612
In addition to Jon Skeet's comment make your Account struct a class and override the Equals() and GetHashCode() method to get nice equality checking.
Upvotes: 0
Reputation: 1502586
For a start I'd get rid of the mutable struct. Mutable value types are a fundamentally bad thing. (As are public fields, IMO.)
It's probably worth building a Dictionary so you can easily compare the contents of the two lists. Once you've got that easy way of checking for presence/absence, the rest should be straightforward.
To be honest though, it sounds like you basically want L2 to be a complete copy of L1... clear L2 and just call AddRange? Or is the point that you also want to take other actions while you're changing L2?
Upvotes: 5