Reputation: 16462
How do hashes work in programming? How I think of a hash is something that allows me the ability to use some unique value to retrieve some data. Like if we have an array and I start to put things in the array, if I have another variable that keeps track of what item is in slot 0,1,2... then I have that instant ability to find an item. Is that hashing?
What is the purpose of a hash?
When should a hash be implemented? What's a hash similar to in terms of data structure?
What I think I know about hashes is that it allows us the ability to retrieve the item within O(1). Is that correct?
Upvotes: 14
Views: 12840
Reputation: 210352
A hash is like a person's first name -- it's a short way of remembering a person, even though it doesn't have to be unique. If you need to find some information about someone, you can just look them up by their name, and you only need to perform other checks if two or more people have the same name.
That's the power of hashing, and just as remembering people is much easier by name than by Social Security Number, finding an object by its hash code is much easier than actually comparing the object to everything already in your collection.
Now, in this example, if you're looking someone up in a phone book by name, you'd probably find them in O(log n) time, because the names are sorted alphabetically, and because you need to do a binary search. If, however, you instead "hashed" 100 people born in the 1900s by their years of birth, then you'd only need at most 4 comparisons in the hashtable/phonebook (one per digit) to find any one year by hash, which is constant time. Then, if two people are born in the same year, you can use other information to find the person you need, and on average, if your table isn't too full (say, if you have at most 50 people for 100 different years of birth), your lookups will be constant-time.
(If your table gets more than, say, 50% full, you can always double its size, to keep the number of collisions low and hence to keep your lookups fast.)
More information:
If you've ever heard of MD5 or SHA-1 SHA-2 hashes for files, they're like the "fingerprints" of the file. While it's possible to have two files with the same hash, this is made so unlikely that, for practical purposes, it's impossible; hence, if you have the hash of two files, you can compare the files by their fingerprints rather than by their data, which is immensely faster.
Upvotes: 14
Reputation: 837886
A hash map / dictionary is a key/value data structure that stores objects in buckets based on the value of a hash function. These keys must be unique but the hash function values (sometimes called hashcodes) aren't necessarily unique.
Like if we have an array and I start to put htings in the array, if I have another varible that keeps track of what item is in slot 0,1,2... then I have that instant ability to find an item. Is that hashing?
No. A hash function is a deterministic function that always gives the same value for an object. The hash code does not change depending on where the object is stored.
What I think I know about hashes is that it allows us the ability to retrieve the item within O(1). Is that correct?
Nearly. A dictionary has O(1) complexity for lookups if there are not too many hash code collisions. However if the hash function is poor and every object has the same hash value then a dictionary could have O(n) performance instead.
Upvotes: 8
Reputation: 41077
A hash
makes it fast to lookup instead of iterating over an array or tree. It makes it possible search O(1)
time with little use of memory.
Upvotes: 1