Reputation: 175
I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.
The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.
Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018
So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.
Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560
My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.
Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.
HashTableTest.java
@Test
public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {
double smallTableTime, bigTableTime;
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
List<String> strings = generateRandomStringKeys(1000);
strings.forEach(string -> bigHashtTable.put(string, 10));
strings.forEach(string -> smallHashTable.put(string, 10));
Consumer<String> bigHashGet = bigHashtTable::get;
Consumer<String> smallHashGet = smallHashTable::get;
String theString = strings.get(strings.size() - 1);
smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);
bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);
System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));
System.out.println(String.format("Fetch BigTable: %.10f", bigTableTime));
assertTrue(smallTableTime > bigTableTime);
}
public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {
long start = 0, end = 0;
for (int i = 0; i < 1000; i++) {
start = System.nanoTime();
aMethod.accept(s);
end = System.nanoTime();
}
return (end - start) / 1_000_000_000D;
}
public List<String> generateRandomStringKeys(int numOfRandomKeys) {
List<String> keys = new ArrayList<>();
for (int i = 0; i < numOfRandomKeys; i++) {
byte[] array = new byte[10];
new Random().nextBytes(array);
keys.add(new String(array, Charset.forName("UTF-8")));
}
return keys;
}
The test can be found here - Github - HashTableTest.java
The implementation can be found here as well - Github - HashTable.java
Upvotes: 0
Views: 95
Reputation: 198033
There are many things wrong here, but a handful include:
nanoTime
for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.table[getHash(key) % RADIX]
, which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.System.identityHashCode
is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.Node.next
as a field, and might as well get rid of it.Upvotes: 1