Reputation: 334
I would like to ask one interested (for me) question.
What collection is the best by criteria performance if collection contains a lot of items (more than 1 million).
By example, I create simple List(10000000) collection and try to add about 500000 different items. First 30000 items will be added in 10 seconds after running, but collection will contain just 60000 items in 1 minute after running and 150000 items in 5 minutes.
As I understand, there is non-linear dependency from memory usage in collection by adding of new item (because every item is creating during "similar equal" time period). But I can make a mistake.
Edit: You are right it is not clear enough without sample. I am trying to fill tree as connected list. You can find sample code below.
public class Matrix
{
public int Id { get; private set; }
public byte[,] Items { get; private set; }
public int ParentId { get; private set; }
public int Lvl { get; private set; }
public int HorizontalCounts
{
get { return 3; }
}
public int VerticalCounts
{
get { return 3; }
}
public Matrix(int id) : this(id, null, 0, 1)
{
}
public Matrix(int id, byte[,] items, int parentId, int lvl)
{
Id = id;
Items = (items ?? (new byte[HorizontalCounts, VerticalCounts]));
ParentId = parentId;
Lvl = lvl;
}
public bool IsEmpty(int hCounter, int vCounter)
{
return (Items[hCounter, vCounter] == 0);
}
public Matrix CreateChild(int id)
{
return (new Matrix(id, (byte[,])Items.Clone(), Id, (Lvl + 1)));
}
}
public class Program
{
public static void Main(string[] args)
{
Matrix node = new Matrix(1);
const int capacity = 10000000;
List<Matrix> tree = new List<Matrix>(capacity) { node };
FillTree(ref tree, ref node);
int l1 = tree.Where(n => (n.Lvl == 1)).Count();
int l2 = tree.Where(n => (n.Lvl == 2)).Count();
int l3 = tree.Where(n => (n.Lvl == 3)).Count();
int l4 = tree.Where(n => (n.Lvl == 4)).Count();
int l5 = tree.Where(n => (n.Lvl == 5)).Count();
}
private static void FillTree(ref List<Matrix> tree, ref Matrix node)
{
for (int hCounter = 0; hCounter < node.HorizontalCounts; hCounter++)
{
for (int vCounter = 0; vCounter < node.VerticalCounts; vCounter++)
{
if (!node.IsEmpty(hCounter, vCounter))
{
continue;
}
int childId = (tree.Select(n => n.Id).Max() + 1);
Matrix childNode = node.CreateChild(childId);
childNode.Items[hCounter, vCounter] = 1;
tree.Add(childNode);
FillTree(ref tree, ref childNode);
}
}
}
}
Latest Edition: I am very sorry, problem was not in amount of items into required collection. Performance problem was in this line: int childId = (tree.Select(n => n.Id).Max() + 1); Thank you very much for your answers and comments.
Upvotes: 2
Views: 5704
Reputation: 81115
Unless the array is going to be created once and exist for the life of the application, I would be inclined to suggest some type of nested array, where the size of each array is kept below 8000 bytes if it contains any double-precision floating-point numbers, or 85,000 bytes if it does not. Objects that size get placed on the Large Object Heap. Unlike the ordinary heap, which can efficiently handle the creation and abandonment of many objects, the large object heap handles it poorly under .net 2.0-3.5, and only somewhat better under 4.0.
If you will not be doing insertions or deletions, I would suggest that it may be easiest to use an array of 1024 arrays of 1024 elements each. Accessing an element by index would be a simple matter of shifting the index right by ten, using the result to select an array, and then using the bottom 10 bits to find the item within the array.
If insertions and deletions will be required, I would suggest using a jagged array along with some sort of data structure to keep track of the logical length of each sub-array, and to help convert indices into array locations. Doing that would avoid the need to copy large amounts of data when performing an insert or delete, at the cost of more expensive subscripting operations.
Upvotes: 1
Reputation: 5020
When you're dealing with millions (or more) of items, its best to use an array. Even if you waste a few thousand slots by making your array larger than absolutely necessary, the time efficiency gained may make up for the loss of space efficiency.
Of course, if you're dealing with an amount of data that's too large to store entirely in memory, a disk-based data structure is advisable.
Upvotes: 0
Reputation: 44776
You want an array, if you know exactly how many beforehand. If you can allocate once, and then simply fill up, then a simple array is perfect. No wasted memory, fastest to fill, fastest to remove from.
Upvotes: 0
Reputation: 24316
The answer to this is it depends. Are you going to be doing many inserts with no sorting? Linked List
Are you going to be doing a lot of lookups? HashMap/Dictionary
Are you going to just have an unordered group of things? List and/or Array
Do you not want duplicates? Set
Do you not want duplicates, but want a fast lookup? HashSet
Do you have an ordered List that is to be sorted by keys? TreeMap
Upvotes: 6
Reputation: 273169
If you want to add a million items, create it like:
var myList = new List<MyItem>(1500000);
Storing 1.5 million references (or small structs) isn't expensive, letting List's adaptive grow algorithm allocate the space will be expensive.
Upvotes: 3