Reputation: 2457
I have a rather general question regarding the recognition speed of any dictionary using a string as a key and couldn't find an answer so far.
Within my current program I have a dictionary of custom objects but the keys I use are filenames including the whole path of the file such that no key can actually occur twice.
My question is: Does the time to find the specific object within the dictionary significantly depend on the length of the string used as a key? Afterall, if I have a big amount of data saved within my object and I use that data in a loop and access the data every single time by using myDictionary[Key]
. The simple recognition might take a long time, making the loops last for a longer time.
My solution to this problem would be: In case of using an array, lets say double[,,]
within my object, I temporarily create a new array and set this one equal to the one within the dictionary, so I don't have to search through the dictionary for every single loop iteration.
Upvotes: 2
Views: 194
Reputation: 172606
Does the time to find the specific object within the dictionary significantly depend on the length of the string used as a key?
Yes it does. Finding an element in a dictionary is done with two CPU intensive steps:
A dictionary stores elements in buckets. To be able to do an O(1) lookup, the dictionary calculates the position in the internal array using hashCode modulo array.Length
. This can result in elements with the same index. Those elements are stored under the same index; which is called a bucket.
For a string, the hash code is generated using all characters in the string, which mean that the generation of the string's hash code has a performance characteristic of O(n). When the string is big, it takes longer to generate the hash code. Comparing to strings for equality is done by comparing two strings completely. If these strings contain, let's say, 100,000 characters and only the last character differs, comparing the two strings can take quite a lot of time. If they differ with the first character, the comparison will return false very quickly. Determining that two strings are in fact equal (if they aren't reference equal) takes the most time, since the complete string needs to be traversed.
If you can, make key strings short if the dictionary is in a performance critical path of the application.
Upvotes: 3
Reputation: 56697
EDIT
The wording of my answer was a bit misleading - trying to clear that up.
In theory, looking up in a dictionary/hashtable should be an O(1) process if the hash function produces ideal hash codes (meaning that for each unique input the output is also unique). If two input strings generate the same hash code (the hash function is not ideal), a list of entries ("bucket") is created for that hash code, which must then be searched item by item.
So, once the hash code is created, looking up the bucket in theory is an O(1) operation. Searching the bucket is an O(n) operation, where n is the number of elements in the bucket.
The string length influences:
So: Yes, the string length actually does matter.
The real question is whether a dictionary is really the right tool in your situation, given that you say you frequently iterate over all the keys in the dictionary. In that case, using a list of objects (containing the file name and other data) and preventing the insertion of duplicates by looking up the file name upon each insertion might be way faster if you insert rarely but lookup often.
Upvotes: 2