Reputation: 395
I need a faster way to create an index file. The application generates pairs of items to be indexed. I currently add each pair as it is generated to a sorted dictionary and then write it out to a disk file. This works well until the number of items added exceeds one million, at which time it slows to the point that is unacceptable. There can be as many as three million data items to be indexed. I prefer to avoid a database because I do not want to significantly increase the size of the deployment package, which is now less than one-half of one megabyte. I tried Access but it is even slower than the sorted dictionary -if it had an efficient bulk load utility then that might work, but I do not find such a tool for Access.
Is there a better way to roll my own index?
Upvotes: 1
Views: 236
Reputation: 273244
Is the SortedDictionary really the bottleneck? As compared to the I/O ?
You really should profile this first to prevent optimizing the wrong parts.
But as a tip, when you have 1M or more items it is a good idea to pre-allocate your Dictionary. Give it an initial capacity of 2M or so.
//var index = new SortedDictionary(2 * 1024 * 1024); // not supported, it's a tree
var index = new SortedList(2 * 1024 * 1024);
If your Dictionary is the problem I would expect it to be from constant re-allocation sooner than from the actual index-searches.
Upvotes: 6
Reputation: 17909
Just a thought, but could you use an in-memory SQL solution, like SQL Lite ? It's just a small DLL, but will help your priorities, do your logic in C# and do your sorting in SQL.
Have a look here:
http://web.archive.org/web/20100208133236/http://www.mikeduncan.com/sqlite-on-dotnet-in-3-mins/
The Download for SQL Lite itself is only 253k, and the .net bindings are about 75k.
Upvotes: 1
Reputation: 48720
Is SQLite too big to deploy with your software? I'm going to agree with Henk that the constant reallocations within the SortedDictionary are probably the bottleneck. If that solution proves false, try using SQLite to see if that increases performance, and then you can decide on where to go from there.
Upvotes: 0