RAM
RAM

Reputation: 866

.NET Collection Classes

Group of related data like a list of parts etc., can be handled either using Arrays(Array of Parts) or using Collection. I understand that When Arrays are used, Insertion, Deletion and some other operations have performance impact when it is compared with Collections. Does this mean that Arrays are not used internally by the collections?, If so what is the data structure used for collections like List, Collection etc?

How the collections are handled internally?

Upvotes: 0

Views: 1126

Answers (3)

cdhowie
cdhowie

Reputation: 168958

List<T> uses an internal array. Removing/inserting items near the beginning of the list will be more expensive than doing the same near the end of the list, since the entire contents of the internal array need to be shifted in one direction. Also, once you try to add an item when the internal list is full, a new, bigger array will be constructed, the contents copied, and the old array discarded.

The Collection<T> class, when used with the parameterless constructor, uses a List<T> internally. So performance-wise they will be identical, with the exception of overhead caused by wrapping. (Essentially one more level of indirection, which is going to be negligible in most scenarios.)

LinkedList<T> is, as its name implies, a linked list. This will sacrifice iteration speed for insertion/removal speed. Since iterating means traversing pointers-to-pointers-to-pointers ad infinitum, this is going to take more work overall. Aside from the pointer traversal, two nodes may not be allocated anywhere near each other, reducing the effectiveness of CPU RAM caches.

However, the amount of time required to insert or remove a node is constant, since it requires the same number of operations no matter the state of the list. (This does not take into account any work that must be done to actually locate the item to remove, or to traverse the list to find the insertion point!)

If your primary concern with your collection is testing if something is in the collection, you might consider a HashSet<T> instead. Addition of items to the set will be relatively fast, somewhere between insertion into a list and a linked list. Removal of items will again be relatively fast. But the real gain is in lookup time -- testing if a HashSet<T> contains an item does not require iterating the entire list. On average it will perform faster than any list or linked list structure.

However, a HashSet<T> cannot contain equivalent items. If part of your requirements is that two items that are considered equal (by an Object.Equals(Object) overload, or by implementing IEquatable<T>) coexist independently in the collection, then you simply cannot use a HashSet<T>. Also, HashSet<T> does not guarantee insertion order, so you also can't use a HashSet<T> if maintaining some sort of ordering is important.

Upvotes: 4

Joel
Joel

Reputation: 19358

There are two basic ways to implement a simple collection:

  • contiguous array
  • linked list

Contiguous arrays have performance disadvantages for the operations you mentioned because the memory space of the collection is either preallocated or allocated based on the contents of the collection. Thus deletion or insertion requires moving many array elements to keep the entire collection contiguous and in the proper order.

Linked lists remove these issues because the items in the collection do not need to be stored in memory contiguously. Instead each element contains a reference to one or more of the other elements. Thus, when an insertion is made, the item in question is created anywhere in memory and only the references on one or two of the elements already in the collection need to be modified.

For example:

LinkedList<object> c = new LinkedList<object>(); // a linked list
object[] a = new object[] { }; // a contiguous array

This is simplified of course. The internals of LinkedList<> are doubtless more complex than a simple singly or doubly linked list, but that is the basic structure.

Upvotes: 1

JayPea
JayPea

Reputation: 9691

I think that some collection classes might use arrays internally as well as linked lists or something similar. The benefit of using collections from the System.Collections namespace instead of arrays, is that you do not need to spend any extra time writing code to perform update operations.

Arrays will always be more lightweight, and if you know some very good search algorithms, then you might even be able to use them more efficiently, but most of the the time you can avoid reinventing the wheel by using classes from System.Collections. These classes are meant to help the programmer avoid writing code that has already been written and tuned hundreds of times, so it is unlikely that you'll get a significant performance boost by manipulating arrays yourself.

When you need a static collection that doesn't require much adding, removing or editing, then perhaps it is a good time to use an array, since they don't require the extra memory that collections do.

Upvotes: -1

Related Questions