Reputation: 173
In a nutshell, I am confused whether the time complexity of checking whether a String exists in a HashSet is O(1) or O(m), m being the length of the longest string being checked.
I understand that the bucket in which the string is going to be placed is determined by the hashCode()
method of the string modulo the size of the HashSet. So this means that in order to find out if a string exists in a HashSet, the hashCode needs to be calculated.
In the hashCode
method of the String
class, I can see that you have to iterate over the entire string to calculate this value. But I also see that there is an option of caching this value.
This is where I am confused. Is the hashcode of a String cached during the creation of a String? Or is it cached when we explicitly call the hashcode
method? Does implicit call to the hashCode
method (as in the case of checking whether the String exists in a Set) also cache the value?
What would be the time complexity of the very first existence check for a String in HashSet?
Thank you.
EDIT: Since it seems I did not explain what I am asking well enough: I am not talking about time complexity if chaining occurs (which would happen if there is hash collisions) / or when the Hash Set is resized. Assume for now that there are no collisions in the Hash Set. Hence size of every bucket is at max 1. In this case, if I am checking if a String is present in the Hash Set, will it take O(1) time, or O(m) time (because the string can have O(m) characters in the worst case and calculating hash code requires traversal of the entire string)
Upvotes: 2
Views: 1352
Reputation: 425398
If the String you’re checking is a new String, its hashCode must be calculated to find its bucket, so that would be O(m) for the hashCode calculation and O(1) for the existence check, so O(m).
If the String object already exists in the Set (or its hashCode has otherwise been calculated), checking it is O(1), because its hashCode is cached.
Upvotes: 2
Reputation: 40057
The time to calculate a given hashCode is dependent on the algorithm and whether it needs to be calculated per call. An Integer
makes no calculations as the hashCode is the int
value. A String would be O(n)
where n
is the number of characters.
Here is the hashCode algorithm for Latin1
strings.
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h + (v & 0xff);
}
return h;
}
However, the hashCode for a String is only calculated once since the String is immutable. So hashCode calls to a String would return the cached value.
Upvotes: 0