rgbk21
rgbk21

Reputation: 173

Time Complexity of checking whether a string is present in a HashSet Java

In a nutshell, I am confused whether the time complexity of checking whether a String exists in a HashSet is O(1) or O(m), m being the length of the longest string being checked.

I understand that the bucket in which the string is going to be placed is determined by the hashCode() method of the string modulo the size of the HashSet. So this means that in order to find out if a string exists in a HashSet, the hashCode needs to be calculated.

In the hashCode method of the String class, I can see that you have to iterate over the entire string to calculate this value. But I also see that there is an option of caching this value.

This is where I am confused. Is the hashcode of a String cached during the creation of a String? Or is it cached when we explicitly call the hashcode method? Does implicit call to the hashCode method (as in the case of checking whether the String exists in a Set) also cache the value?

What would be the time complexity of the very first existence check for a String in HashSet?

Thank you.

EDIT: Since it seems I did not explain what I am asking well enough: I am not talking about time complexity if chaining occurs (which would happen if there is hash collisions) / or when the Hash Set is resized. Assume for now that there are no collisions in the Hash Set. Hence size of every bucket is at max 1. In this case, if I am checking if a String is present in the Hash Set, will it take O(1) time, or O(m) time (because the string can have O(m) characters in the worst case and calculating hash code requires traversal of the entire string)

Upvotes: 2

Views: 1352

Answers (2)

Bohemian
Bohemian

Reputation: 425398

If the String you’re checking is a new String, its hashCode must be calculated to find its bucket, so that would be O(m) for the hashCode calculation and O(1) for the existence check, so O(m).

If the String object already exists in the Set (or its hashCode has otherwise been calculated), checking it is O(1), because its hashCode is cached.

Upvotes: 2

WJS
WJS

Reputation: 40057

The time to calculate a given hashCode is dependent on the algorithm and whether it needs to be calculated per call. An Integer makes no calculations as the hashCode is the int value. A String would be O(n) where n is the number of characters.

Here is the hashCode algorithm for Latin1 strings.

public static int hashCode(byte[] value) {
        int h = 0;
        for (byte v : value) {
            h = 31 * h + (v & 0xff);
        }
        return h;
    }

However, the hashCode for a String is only calculated once since the String is immutable. So hashCode calls to a String would return the cached value.

Upvotes: 0

Related Questions