Reputation: 489
I'm using a List<T>
and I need to update the objects properties that the list has.
What would be the most efficient/faster way to do this? I know that scanning through the index of a List<T>
would be slower as this list grows and that the List<T>
is not the most efficient collection to do updates.
That sad, would be better to:
Stub code sample:
public class Product
{
public int ProductId { get; set; }
public string ProductName { get; set; }
public string Category { get; set; }
}
public class ProductRepository
{
List<Product> product = Product.GetProduct();
public void UpdateProducts(IEnumerable<Product> updatedProduct)
{
}
public void UpdateProduct(Product updatedProduct)
{
}
}
Upvotes: 1
Views: 2519
Reputation: 31596
What exactly is efficiency?
Unless there are literally thousands of items doing a foreach, or for or any other type of looping operation will most likely only show differences in the milleseconds. Really? Hence you have wasted more time (in costs of a programmer at $XX per hour than an end user costs) trying to find that best.
So if you have literally thousands of records I would recommend that efficiency be found by parallel processing the list with the Parallel.Foreach method which can process more records to save time with the overhead of threading.
IMHO if the record count is greater than 100 it implies that there is a database being used. If a database is involved, write an update sproc and call it a day; I would be hard pressed to write a one-off program to do a specific update which could be done in an easier fashion in said database.
Upvotes: 1
Reputation: 11482
Your use case is updating a
List<T>
, which can contains millions of records, and updated records can be a sub-list or just a single record
Following is the Schema:
public class Product
{
public int ProductId { get; set; }
public string ProductName { get; set; }
public string Category { get; set; }
}
Does
Product
contains a primary key, which means everyProduct
object can be uniquely identified and there are no duplicates and every update target a single unique record?
If Yes, then it is best to arrange the List<T>
in the form of Dictionary<int,T>
, which would mean for an IEnumerable<T>
every update would be an O(1)
time complexity and that would mean all the updates could be done depending on the size of the IEnumerable<T>
, which i don't expect to be very big and though there would be extra memory allocation of different data structure required, but would be a very fast solution.@JamieLupton has already provided a solution on similar lines
In case
Product
is repeated, there's no primary key, then above solution is not valid, then ideal way to scan through theList<T>
is Binary Search, whose time complexity isO(logN)
Now since size of IEnumerable<T>
is comparatively small say M, so the overall time complexity would be O(M*logN)
, where M is much smaller than N and can be neglected.
List<T>
support Binary Search API, which provides the element index, which can then be used to update the object at relevant index, check example here
Best Option as per me for such a high number of records would be parallel processing along with binary search
Now since, thread safety is an issue, what I normally do is divide a List<T>
into List<T>[]
, since then each unit can be assigned to a separate thread, a simple way is use MoreLinq
batch Api, where you can fetch the number of system processors as using Environment.ProcessorCount
and then create IEnumerable<IEnumerable<T>>
as follows:
var enumerableList = List<T>.Batch(Environment.ProcessorCount).ToList();
Another way is following custom code:
public static class MyExtensions
{
// data - List<T>
// dataCount - Calculate once and pass to avoid accessing the property everytime
// Size of Partition, which can be function of number of processors
public static List<T>[] SplitList<T>(this List<T> data, int dataCount, int partitionSize)
{
int remainderData;
var fullPartition = Math.DivRem(dataCount, partitionSize, out remainderData);
var listArray = new List<T>[fullPartition];
var beginIndex = 0;
for (var partitionCounter = 0; partitionCounter < fullPartition; partitionCounter++)
{
if (partitionCounter == fullPartition - 1)
listArray[partitionCounter] = data.GetRange(beginIndex, partitionSize + remainderData);
else
listArray[partitionCounter] = data.GetRange(beginIndex, partitionSize);
beginIndex += partitionSize;
}
return listArray;
}
}
Now you can create Task[]
, where each Task
is assigned for every element List<T>
, on the List<T>[]
generated above, then Binary search for each sub partition. Though its repetitive but would be using the power of Parallel processing and Binary search. Each Task
can be started and then we can wait using Task.WaitAll(taskArray)
to wait for Task processing to finish
Over and above that, if you want to create a Dictionary<int,T>[]
and thus use parallel processing then this would be fastest.
Final integration of List<T>[]
to List<T>
can be done using Linq Aggregation
or SelectMany
as follows:
List<T>[] splitListArray = Fetch splitListArray;
// Process splitListArray
var finalList = splitListArray.SelectMany(obj => obj).ToList()
Another option would be to use
Parallel.ForEach
along with a thread safe data structure likeConcurrentBag<T>
or may beConcurrentDictionary<int,T>
in case you are replacing complete object, but if its property update then a simpleList<T>
would work.Parallel.ForEach
internally use range partitioner similar to what I have suggested above
Solutions mentioned above ideally depends on your use case, you shall be able to use combination to achieve the best possible result. Let me know, in case you need specific example
Upvotes: 1
Reputation: 118
You could consider using Dictionary instead of List if you want fast lookups. In your case it would be the product Id (which I am assuming is unique). Dictionary MSDN
For example:
public class ProductRepository
{
Dictionary<int, Product> products = Product.GetProduct();
public void UpdateProducts(IEnumerable<Product> updatedProducts)
{
foreach(var productToUpdate in updatedProducts)
{
UpdateProduct(productToUpdate);
}
///update code here...
}
public void UpdateProduct(Product productToUpdate)
{
// get the product with ID 1234
if(products.ContainsKey(productToUpdate.ProductId))
{
var product = products[productToUpdate.ProductId];
///update code here...
product.ProductName = productToUpdate.ProductName;
}
else
{
//add code or throw exception if you want here.
products.Add(productToUpdate.ProductId, productToUpdate);
}
}
}
Upvotes: 2