understanding Array#uniq function calls

If I use the following code:

class Item
  attr_reader :item_name, :qty

  def initialize(item_name, qty)
    @item_name = item_name
    @qty = qty
  end

  def to_s
   "Item (#{@item_name}, #{@qty})"
  end

  def hash
    p "hash has been called"
    self.item_name.hash ^ self.qty.hash
  end

  def eql?(other_item)
    puts "#eql? invoked"
    @item_name == other_item.item_name && @qty == other_item.qty
  end  
end

p Item.new("abcd", 1).hash

items = [Item.new("abcd", 1), Item.new("abcd", 1), Item.new("abcd", 1)]
p items.uniq

"hash has been called"
4379041107527942435
"hash has been called"
"hash has been called"
#eql? invoked
"hash has been called"
#eql? invoked
"hash has been called"
"hash has been called"
"hash has been called"
[Item (abcd, 1)]

I'm interpreting this to mean that the #hash method is being used to generate unique integers for each object, where then #eql? is invoked to check if the integers are equal as a way to check for duplicates. Is my interpretation correct?

Upvotes: 1

Views: 100

Answers (1)

Jörg W Mittag
Jörg W Mittag

Reputation: 369468

No, your interpretation is not correct.

hash does not generate unique integers, which is precisely why the eql? call is necessary, and eql? is not called on the integers but on the elements.

This is just plain old hashing, exactly identical to what is used in Hash, Set, and SortedSet.

hash is a hash function, i.e. a function which maps a large (potentially infinite) input space to a smaller, fixed-size output space. Since the output space is smaller than the input space, there must necessarily be at least two distinct objects with the same hash code, thus the hash values are not unique! (This is called the Pigeonhole Principle. Intuitively: if you have two drawers and three socks, then there must be at least one drawer with at least two socks in them.)

Because the hash values are not unique, two identical hash values don't tell you anything. If two hash values are different, then you know definitely that the two objects are also different. But if two hash values are the same, then the objects could still be different (this is called a Hash Collision), that's why you have to double-check using eql?.

Upvotes: 4

Related Questions