fsdff
fsdff

Reputation: 621

Comparison of these two algorithms?

So I'm presented with a problem that states. "Determine if a string contains all unique characters"

So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.

private static boolean allUniqueCharacters(String s) {

    Set<Character> charSet = new HashSet<Character>();
    for (int i = 0; i < s.length(); i++) {
        char currentChar = s.charAt(i);
        if (!charSet.contains(currentChar)) {
            charSet.add(currentChar);

        } else {
            return false;
        }

    }
    return true;

}

According to the book I am reading this is the "optimal solution"

public static boolean isUniqueChars2(String str) {
    if (str.length() > 128)
        return false;

    boolean[] char_set = new boolean[128];

    for (int i = 0; i < str.length(); i++) {
        int val = str.charAt(i);

        if (char_set[val]) {
            return false;
        }
        char_set[val] = true;
    }

    return true;
}

My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?

Thank you.

Upvotes: 17

Views: 1899

Answers (6)

allo
allo

Reputation: 4236

The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k), while the array has a lookup complexity in O(1).

This sounds like your algorithm must be much worse. But in fact it is not, as k is bounded by 128 (else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1) as well with a bit bigger constants than the array lookup.

* assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n) resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.

This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n) for the converstion, O(log n) for the lookup) before its lookup time degenerates to O(n) because of too many collisions.

Upvotes: 0

user1196549
user1196549

Reputation:

The hashmap is in theory acceptable, but is a waste.

A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.

This adds a lot of overhead in terms of space and time, compared to a straight array.

Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.


As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.

Upvotes: 2

ernest_k
ernest_k

Reputation: 45339

Both algorithms have time complexity of O(N). The difference is in their space complexity.

The book's solution will always require storage for 128 characters - O(1), while your solution's space requirement will vary linearly according to the input - O(N).

The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.

Upvotes: 3

Sweeper
Sweeper

Reputation: 274413

As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.

Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.

For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char values there are. You would also need to remove the check for a string longer than 128.

This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.

Upvotes: 12

Jon Thoms
Jon Thoms

Reputation: 10797

Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.

Upvotes: 1

amerykanin
amerykanin

Reputation: 255

Your algorithm is also O(1). You can think about complexity like how my algorithm will react to the change in amount of elements processed. Therefore O(n) and O(2n) are effectively equal.

People are talking about O notation as growth rate here

Upvotes: 1

Related Questions