Reputation: 9855
If I have a set number of String
's that I want to check for in a free form field (computer generated, so could be many per second) which would be a faster implementation?
private static HashSet<String> values = new HashSet<String>();
static {
... add 5 Strings to the Set
}
public void someMethod() {
if (values.contains(enteredValue))
...
}
Or doing the if with 5 String.equals
||?
It seems like a no-brainer to me but maybe I'm wrong. Any disadvantages to one and not the other?
Upvotes: 2
Views: 5035
Reputation: 592
Came here looking for the answer after finding that the string.equals
was faster - and was against what I had expected. But here is the code and a result. Feedback please...
This was the result in nanoseconds:
import java.time.Duration;
import java.time.Instant;
import java.util.HashMap;
import java.util.HashSet;
public class Main {
HashMap<Integer, String> testKeys = new HashMap<>();
HashSet<String> keys2 = new HashSet<>();
String key1 = null;
String key2 = null;
static HashMap<String, Long> results = new HashMap<String, Long>();
public static void main(String[] args) {
var test1 = new Main();
var test2 = new Main();
test1.init();
test2.init();
Instant start, finish;
long timeElapsed;
for (int i = 0; i < 10; i++) {
start = Instant.now();
test1.stringCompare();
finish = Instant.now();
timeElapsed = getElapsed(start, finish);
addResult("stringCompare", timeElapsed);
System.out.println("test1.run1 stringCompare time elapsed: " + timeElapsed);
start = Instant.now();
test2.stringCompare();
finish = Instant.now();
timeElapsed = getElapsed(start, finish);
addResult("stringCompare", timeElapsed);
System.out.println("test2.run1 stringCompare time elapsed: " + timeElapsed);
//
start = Instant.now();
test1.mapSetContains();
finish = Instant.now();
timeElapsed = getElapsed(start, finish);
addResult("mapSetContains", timeElapsed);
System.out.println("test2.run1 mapSetContains time elapsed: " + timeElapsed);
start = Instant.now();
test2.mapSetContains();
finish = Instant.now();
timeElapsed = getElapsed(start, finish);
addResult("mapSetContains", timeElapsed);
System.out.println("test2.run2 mapSetContains time elapsed: " + timeElapsed);
}
emitResults();
}
static void emitResults(){
for (String item : results.keySet()) {
System.out.println("item: " + item + " " + results.get(item));
}
}
static void addResult(String testType, Long result) {
if (results.containsKey(testType))
results.put(testType, results.get(testType) + result);
else
results.put(testType, result);
}
static long getElapsed(Instant start, Instant finish) {
long timeElapsed;
timeElapsed = Duration.between(start, finish).toNanos(); // .toMillis();
return timeElapsed;
}
void stringCompare() {
for (int i = 0; i < testKeys.size(); i++) {
String testKey = testKeys.get(i);
boolean rv = testKey.equals(key1) || testKey.equals(key2);
}
}
void mapSetContains() {
for (int i = 0; i < testKeys.size(); i++) {
String testKey = testKeys.get(i);
boolean rv = keys2.contains(testKey);
}
}
void init() {
for (int i = 0; i < 4; i++) {
testKeys.put(i, makeUiid());
}
key1 = testKeys.get(0);
key2 = testKeys.get(1);
keys2.add(testKeys.get(2));
keys2.add(testKeys.get(3));
}
String makeUiid() {
return java.util.UUID.randomUUID().toString();
}
}
Upvotes: 0
Reputation: 23465
String source code:
Hash-relevant code:
/** Cache the hash code for the string */
private int hash; // Default to 0
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31 * h + val[off++];
}
hash = h;
}
return h;
}
Equals code:
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String) anObject;
int n = count;
if (n == anotherString.count) {
char v1[] = value;
char v2[] = anotherString.value;
int i = offset;
int j = anotherString.offset;
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
return true;
}
}
return false;
}
So, each one involves a single loop over all characters in the string, the hash is only computed once for each string, but unlike the hash-computing loop, the equals loop gets to preemptively exit at the first character mismatch, and moreover the equals loop doesn't even happen if the strings have different lengths.
My gut feeling is that unless you're comparing the same strings to the same strings over and over, equals wins.
Tough call. Do a benchmark if you really want to know which one is faster for your application.
Upvotes: 2
Reputation: 234795
If you sort the strings and do a binary search, then you will do a maximum of three compareTo
tests. If you use a HashSet, you will have to compute the hash for the test string and do at least one equals
test (if it matches a hashcode) or no equals
test (for a miss). It's not at all clear to me whether there is going to be a big difference here, and the actual performance may depend on secondary issues like the level of optimization.
The answer, like always for this kind of question, is to benchmark.
Upvotes: 0
Reputation: 11308
HashSet
will not necessarily be faster, but the time will be constant. Quoting from the Java Documentation.
This class offers constant time performance for the basic operations (add, remove, contains and size)
So, if you add more Strings to be searched for the value, if you use equals the time will be relative to the number n of Strings but with a HashSet
it will remain constant.
Upvotes: 0
Reputation: 54148
It will depend on the length, content and number of your strings.
If the strings are few and randomly-populated, then there's a good chance that simple comparison will find a mismatch within one or two characters, only checking further when the contents do match in full. Compared to the overhead of HashSet
maintenance and hash-code generation (full string every time) I'd bet on the simple compare.
If strings are likely to be similar, or more numerous, HashSet
would be better.
[Note that answers presuming HashSet
will be faster ignore the fact that you have to generate the hashcode for every addition to the HashSet
, not just for lookups. This fact does not matter if your reference strings do not change over time, though.]
Upvotes: 1
Reputation: 13289
From http://en.wikipedia.org/wiki/Java_hashCode%28%29#The_java.lang.String_hash_function
From Java 1.2, java.lang.String class implements its hashCode() using a product sum algorithm over the entire text of the string.
Wild guess here, but I don't think there is going to be much difference, since computing the hash itself is roughly as costly as doing the direct string compare, and you may have to deal with collisions.
Upvotes: 0
Reputation: 42597
There's only one way to be sure - benchmark it with realistic values.
Upvotes: 1
Reputation: 138874
I believe the HashSet
would be faster because it will hash your string once, and then do 5 integer comparisons. That should be faster that doing 5 String
comparisons.
That being said, I suggest you just choose one way and try it out. If it doesn't perform fast enough, then worry about optimizing it more.
Upvotes: 5