Reputation: 32392
I'm creating an app that holds loads of loads of user data in memory, and it's mostly keeping it all in List<T> structures (and some Dictionary<T,T> when I need lookup).
And I'm wondering...
How efficient are Lists? How much memory overhead do I get for each of them? (that is, memory space in addition to what the objects they contain would take) How much of a penalty do I pay every time I instance a new one?
Is there a more efficient way?
Dictionaries are just HashTables, right? Or are them a less efficient data structure?
I'd like to use Arrays, but I have the typical problem of adding and removing things all the time from them, so having to grow / shrink them would be a pain.
Any ideas/suggestions?
Edit: I know my basic data structures 101, and why a Linked List is better for adding/removing, and a HashTable is better for Random Access.
I'm mostly concerned about .Net's idionsyncracies. How much memory each of these structure wastes, for example. And time wasted on initializing / killing them.
Things like, for example, if it takes a lot of time to instance/GC a List, but not much to clear it, maybe I should keep a little pool of Lists waiting for me, and clear them and send them back to the pool when done, instead of simply dereferencing them.
Or, if Hashtables are faster for access but waste a lot of memory, I might prefer to use Lists and traverse them, for small item counts.
And I'd also really like to focus on memory usage, since my app is hediously memory intensive (think memcached like)... Does anyone know where I can find such info?
Upvotes: 9
Views: 1544
Reputation: 2381
The .Net List doesn't use a linked list. It is an array, it starts with 4 positions by default and I think it doubles in size as you add things. So performance can vary a bit depending on how you use it.
If your using VS 2008 run the profiler before you get too far down this rat hole. When we started actually looking at where we were losing time it didn't take long for use to figure out that debating the finer points of linked lists just really didn't matter.
Upvotes: 0
Reputation: 287430
I wouldn't move a finger until there's some performance problem and a profiler showed you were it is. Then you'll have a definitive problem to solve and it'll be much easier.
Upvotes: 0
Reputation: 14189
If you really want to see all the gory details of how List<> and Dictionary<,> are implemented, use the wonderfully useful .NET Reflector.
See also the documentation for the excellent C5 Generic Collection Library, which has very good implementations of a number of collection types missing from the BCL.
Upvotes: 2
Reputation: 13414
I think the two-process thing might be overkill; plus the interprocess communication will likely have some slowness (although I've never tried such a thing so take my opinion of it as a grain of salt). I work on a data-driven application where each data unit is tiny, but we may have upwards of a billion data units at any given time. The method we use is basically:
In other words, it's a homebrew caching scheme. The benefit is you can control with pinpoint accuracy what data is in memory, which you cannot if you rely on the OS paging scheme. If some commonly used variable ends up mixed in with your data on a page, that page will be repeatedly hit and prevent it from going to disk. If you design into your application an accommodation that some data requests will take longer than others, then this will work pretty well. Particularly if you know what chunks you will need ahead of time (we don't).
Keep in mind that everything in a .NET app has to fit within 2 GB of memory, and because of the way the GC works and the overhead of you app, you actually probably have somewhat less than that to work with.
To spy on exactly what your heap looks like and who is allocating, use the CLR profiler: http://www.microsoft.com/downloads/details.aspx?familyid=86ce6052-d7f4-4aeb-9b7a-94635beebdda&displaylang=en
Upvotes: 1
Reputation: 13414
If you are concerned about memory usage, the real key is to store your array on disk and map just the parts you need into memory at that time.
The key is to use FILE_FLAG_NO_BUFFERING and always read/write exactly one sector's worth of data.
Upvotes: 1
Reputation: 2271
The LinkedList object would take less time to add to and remove from because of the nature of linked lists. When you add an element it does not have to resize an array like a normal list does. Other than that improvement I would suspect that the LinkedList would perform about the same as a normal List.
See this on Wikipedia: Linked Lists vs. Arrays
Upvotes: 2
Reputation: 37807
If you need efficiency in inserting or removing at random places in the list there is a LinkedList data structure - the MSDN Article gives details. Obviously being a linked list random access isn't efficient.
Upvotes: 2
Reputation: 11436
Maybe you should consider using some type of in-memory database if you have that much data that has to be held in the memory,
Upvotes: 4
Reputation: 29594
List uses an array internally and Dictionary uses an hash table.
They are faster then the older non-generic classes ArrayList and HashTable because you don't have the cost of converting everything to/from object (boxing, unboxing and type checking) and because MS optimized them better then the old classes.
Upvotes: 2
Reputation: 13414
Lists are arrays underneath, so the performance hit of adding an item, unless it is at the end, will be very costly.
Otherwise they will be basically as fast as an array.
Upvotes: 2