What data structure(s) are appropriate for my situation?

Question

To start off, this is probably an XY problem, sorry about that.

I'm loading a file table from a file and placing it into an in-memory file tree. Nodes in the tree represent a directory/file in the tree. Currently, I'm using two data structures per node and that results in noticeable load times, due to inserting into collections, and higher memory usage, due to duplicating string data and referencing each node twice. Trees are loaded once and are not mutated after that.

Every node has a List for accessing sorted child nodes and a Dictionary for accessing child nodes by name. The list is lazily sorted for performance reasons. SortedDictionary does not fit my use requirements because I need nodes with children be sorted above nodes without children, so passing an IComparer will not suffice. When two nodes both have/don't have children, then they are sorted lexicographically (OrdinalIgnoreCase).

Is there a data structure in .net which would fulfill my needs?

Additionally, is there any way to provide a hash for the key when inserting to a Dictionary, and to later get a portion of a bucket from the Dictionary (ie: GetValuesByHash(int hashValue) yields all values whose corresponding key has the given hash)? The file table which I am reading in already contains hash values for entire file paths (applicable to another thing I'm doing), and currently, the dictionary is just recalculating them for no reason.

I think I can hack together a solution by defining my own custom key which contains { Hash, Node } alongside a custom comparer, but that seems really ugly, and you wouldn't be able to get buckets of nodes sharing the same hash. If anything, that would still feel like using the wrong data structure.

I have already googled "c# dictionary get hash" along with a few other queries, though at this point, I haven't' seen any similar questions.

Overall, looking for a data structure (probably related to a dictionary) with the following properties:

ContainsKeyOfHash(), Get(hash): File Name Hash -> File Entry Descriptor(s)
ContainsKey(), Get(key): File Name -> File Entry Descriptor
Add(string fileName, Entry entry, int hash = gethash(fileName))

Entries are sorted as follows:

    m_children.Sort(
       (a, b) => {
          bool aHasChildren = a.HasChildren;
          bool bHasChildren = b.HasChildren;
          if (aHasChildren && !bHasChildren)
             return 1;
          if (!aHasChildren && bHasChildren)
             return -1;
          else
             return -String.Compare(a.m_resourceName, b.m_resourceName, StringComparison.OrdinalIgnoreCase);
       }
    );

All child nodes can be retrieved in the above sorted order. Currently, I have a ChildrenSorted and ChildrenUnsorted property. The ChildrenSorted property may have a performance hit due to sorting, while the ChildrenUnsorted property does not.

I think that at worse, my solution would be to write my own Dictionary-like class. I don't have to remove keys from the dictionary, so it shouldn't be hard. I sort of want to avoid doing that, though.

My node implementation can be viewed at: http://pastie.org/5547925

Thanks!

usr · Accepted Answer

I think your solution is already pretty good. Here are some thoughts:

For a collection that is both sorted and has fast access by key, I can only think of tree data structures. You probably don't want a data structure that allocates one object per item. Probably you are served best by a kind of heap where all items sit in a single array. I think you can build that structure very efficiently by first sorting all children and then populating them (just like you are doing it now).
You might consider stuffing all data into a single such tree. That would save you most the per-node overhead (like collections which themselves have child-objects). The key would be the "path" to the node, stored in some efficient format. It could either be a path like "d1\d2\filename" or a string[].

Point (2) would be how an RDBMS would do it.

What data structure(s) are appropriate for my situation?

Answers (2)

Related Questions