Gandalf
Gandalf

Reputation: 9855

Comparing A HashSet<String> to String.equals(...)

If I have a set number of String's that I want to check for in a free form field (computer generated, so could be many per second) which would be a faster implementation?

private static HashSet<String> values = new HashSet<String>();
static {
   ... add 5 Strings to the Set
}
public void someMethod() {
   if (values.contains(enteredValue))
   ...
}

Or doing the if with 5 String.equals ||?

It seems like a no-brainer to me but maybe I'm wrong. Any disadvantages to one and not the other?

Upvotes: 2

Views: 5035

Answers (8)

Shawn Cicoria
Shawn Cicoria

Reputation: 592

Came here looking for the answer after finding that the string.equals was faster - and was against what I had expected. But here is the code and a result. Feedback please...

This was the result in nanoseconds:

  • mapSetContains 166000
  • stringCompare 143000
import java.time.Duration;
import java.time.Instant;
import java.util.HashMap;
import java.util.HashSet;

public class Main {

    HashMap<Integer, String> testKeys = new HashMap<>();
    HashSet<String> keys2 = new HashSet<>();
    String key1 = null;
    String key2 = null;

    static HashMap<String, Long> results = new HashMap<String, Long>();

    public static void main(String[] args) {

        var test1 = new Main();
        var test2 = new Main();

        test1.init();
        test2.init();

        Instant start, finish;
        long timeElapsed;

        for (int i = 0; i < 10; i++) {

            start = Instant.now();
            test1.stringCompare();
            finish = Instant.now();
            timeElapsed = getElapsed(start, finish);

            addResult("stringCompare", timeElapsed);
            System.out.println("test1.run1 stringCompare time elapsed: " + timeElapsed);

            start = Instant.now();
            test2.stringCompare();
            finish = Instant.now();
            timeElapsed = getElapsed(start, finish);
            addResult("stringCompare", timeElapsed);
            System.out.println("test2.run1 stringCompare time elapsed: " + timeElapsed);

            //

            start = Instant.now();
            test1.mapSetContains();
            finish = Instant.now();
            timeElapsed = getElapsed(start, finish);
            addResult("mapSetContains", timeElapsed);
            System.out.println("test2.run1 mapSetContains time elapsed: " + timeElapsed);

            start = Instant.now();
            test2.mapSetContains();
            finish = Instant.now();
            timeElapsed = getElapsed(start, finish);
            addResult("mapSetContains", timeElapsed);
            System.out.println("test2.run2 mapSetContains time elapsed: " + timeElapsed);
        }

        emitResults();

    }


    static void emitResults(){
        for (String item : results.keySet()) {
            System.out.println("item: " + item + "  " + results.get(item));
            
        }
    }

    static void addResult(String testType, Long result) {
        if (results.containsKey(testType))
            results.put(testType, results.get(testType) + result);
        else
            results.put(testType, result);
    }

    static long getElapsed(Instant start, Instant finish) {
        long timeElapsed;
        timeElapsed = Duration.between(start, finish).toNanos(); // .toMillis();
        return timeElapsed;
    }

    void stringCompare() {

        for (int i = 0; i < testKeys.size(); i++) {

            String testKey = testKeys.get(i);
            boolean rv = testKey.equals(key1) || testKey.equals(key2);
        }

    }

    void mapSetContains() {

        for (int i = 0; i < testKeys.size(); i++) {

            String testKey = testKeys.get(i);
            boolean rv = keys2.contains(testKey);
        }

    }

    void init() {

        for (int i = 0; i < 4; i++) {
            testKeys.put(i, makeUiid());
        }

        key1 = testKeys.get(0);
        key2 = testKeys.get(1);

        keys2.add(testKeys.get(2));
        keys2.add(testKeys.get(3));
    }

    String makeUiid() {
        return java.util.UUID.randomUUID().toString();
    }
}

Upvotes: 0

trutheality
trutheality

Reputation: 23465

String source code:

Hash-relevant code:

/** Cache the hash code for the string */
private int hash; // Default to 0

public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31 * h + val[off++];
        }
        hash = h;
    }
    return h;
}

Equals code:

public boolean equals(Object anObject) {
    if (this  == anObject) {
        return true;
    }
    if (anObject instanceof  String) {
        String anotherString = (String) anObject;
        int n = count;
        if (n == anotherString.count) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = offset;
            int j = anotherString.offset;
            while (n-- != 0) {
                if (v1[i++] != v2[j++])
                    return false;
            }
            return true;
        }
    }
    return false;
}

So, each one involves a single loop over all characters in the string, the hash is only computed once for each string, but unlike the hash-computing loop, the equals loop gets to preemptively exit at the first character mismatch, and moreover the equals loop doesn't even happen if the strings have different lengths.

My gut feeling is that unless you're comparing the same strings to the same strings over and over, equals wins.

Tough call. Do a benchmark if you really want to know which one is faster for your application.

Upvotes: 2

Ted Hopp
Ted Hopp

Reputation: 234795

If you sort the strings and do a binary search, then you will do a maximum of three compareTo tests. If you use a HashSet, you will have to compute the hash for the test string and do at least one equals test (if it matches a hashcode) or no equals test (for a miss). It's not at all clear to me whether there is going to be a big difference here, and the actual performance may depend on secondary issues like the level of optimization.

The answer, like always for this kind of question, is to benchmark.

Upvotes: 0

Marcelo
Marcelo

Reputation: 11308

HashSet will not necessarily be faster, but the time will be constant. Quoting from the Java Documentation.

This class offers constant time performance for the basic operations (add, remove, contains and size)

So, if you add more Strings to be searched for the value, if you use equals the time will be relative to the number n of Strings but with a HashSet it will remain constant.

Upvotes: 0

Steve Townsend
Steve Townsend

Reputation: 54148

It will depend on the length, content and number of your strings.

If the strings are few and randomly-populated, then there's a good chance that simple comparison will find a mismatch within one or two characters, only checking further when the contents do match in full. Compared to the overhead of HashSet maintenance and hash-code generation (full string every time) I'd bet on the simple compare.

If strings are likely to be similar, or more numerous, HashSet would be better.

[Note that answers presuming HashSet will be faster ignore the fact that you have to generate the hashcode for every addition to the HashSet, not just for lookups. This fact does not matter if your reference strings do not change over time, though.]

Upvotes: 1

dfb
dfb

Reputation: 13289

From http://en.wikipedia.org/wiki/Java_hashCode%28%29#The_java.lang.String_hash_function

From Java 1.2, java.lang.String class implements its hashCode() using a product sum algorithm over the entire text of the string.

Wild guess here, but I don't think there is going to be much difference, since computing the hash itself is roughly as costly as doing the direct string compare, and you may have to deal with collisions.

Upvotes: 0

DNA
DNA

Reputation: 42597

There's only one way to be sure - benchmark it with realistic values.

Upvotes: 1

jjnguy
jjnguy

Reputation: 138874

I believe the HashSet would be faster because it will hash your string once, and then do 5 integer comparisons. That should be faster that doing 5 String comparisons.

That being said, I suggest you just choose one way and try it out. If it doesn't perform fast enough, then worry about optimizing it more.

Upvotes: 5

Related Questions