AJFaraday
AJFaraday

Reputation: 2450

What does the Ruby `uniq` method use for equality checking?

I'm interested in implementing a custom equality method for use in an array of objects in Ruby. Here's a stripped-back example:

class Foo

  attr_accessor :a, :b

  def initialize(a, b)
    @a = a
    @b = b
  end 

  def ==(other)
    puts 'doing comparison'
    @a == @a && @b == @b
  end

  def to_s
    "#{@a}: #{@b}"  
  end

end 

a = [
  Foo.new(1, 1),
  Foo.new(1, 2),
  Foo.new(2, 1),
  Foo.new(2, 2),
  Foo.new(2, 2)
]
a.uniq

I expected the uniq method to call Foo#==, and remove the last instance of Foo. Instead, I don't see the 'doing comparison' debug line and the array remains the same length.

Notes:

Upvotes: 7

Views: 1535

Answers (2)

mrzasa
mrzasa

Reputation: 23327

It compares values using their hash and eql? methods for efficiency.

https://ruby-doc.org/core-2.5.0/Array.html#method-i-uniq-3F

So you should override eql? (that is ==) and hash

UPDATE:

I cannot explain fully why is that, but overriding hash and == doesn't work. I guess it's cause by the way uniq is implemented in C:

From: array.c (C Method): Owner: Array Visibility: public Number of lines: 20

static VALUE
rb_ary_uniq(VALUE ary)
{
    VALUE hash, uniq;

    if (RARRAY_LEN(ary) <= 1)
        return rb_ary_dup(ary);
    if (rb_block_given_p()) {
        hash = ary_make_hash_by(ary);
        uniq = rb_hash_values(hash);
    }
    else {
        hash = ary_make_hash(ary);
        uniq = rb_hash_values(hash);
    }
    RBASIC_SET_CLASS(uniq, rb_obj_class(ary));
    ary_recycle_hash(hash);

    return uniq;
}

You can bypass that by using a block version of uniq:

> [Foo.new(1,2), Foo.new(1,2), Foo.new(2,3)].uniq{|f| [f.a, f.b]}
=> [#<Foo:0x0000562e48937cc8 @a=1, @b=2>, #<Foo:0x0000562e48937c78 @a=2, @b=3>]

Or use Struct instead:

F = Struct.new(:a, :b)
[F.new(1,2), F.new(1,2), F.new(2,3)].uniq
# => [#<struct F a=1, b=2>, #<struct F a=2, b=3>]

UPDATE2:

Actually in terms of overriding it's not the same if you override == or eql?. When I overriden eql? It worked as intended:

class Foo
  attr_accessor :a, :b

  def initialize(a, b)
    @a = a
    @b = b
  end 

  def eql?(other)
    (@a == other.a && @b == other.b)
  end

  def hash
    [a, b].hash
  end

  def to_s
    "#{@a}: #{@b}"  
  end

end 

a = [
  Foo.new(1, 1),
  Foo.new(1, 2),
  Foo.new(2, 1),
  Foo.new(2, 2),
  Foo.new(2, 2)
]
a.uniq
#=> [#<Foo:0x0000562e483bff70 @a=1, @b=1>,
#<Foo:0x0000562e483bff48 @a=1, @b=2>,
#<Foo:0x0000562e483bff20 @a=2, @b=1>,
#<Foo:0x0000562e483bfef8 @a=2, @b=2>]

Upvotes: 10

J&#246;rg W Mittag
J&#246;rg W Mittag

Reputation: 369468

You can find the answer in the documentation of Array#uniq (for some reason, it is not mentioned in the documentation of Enumerable#uniq):

It compares values using their hash and eql? methods for efficiency.

The contracts of hash and eql? are as follows:

  • hash returns an Integer that must be the same for objects which are considered equal, but does not necessarily have to be different for objects that are not equal. This means that different hashes mean that the objects are definitely not equal, but the same hash doesn't tell you anything. Ideally, hash should also be resistant to accidental and deliberate collisions.
  • eql? is value equality, usually stricter than == but less strict than equal? which is more or less identity: equal? should only return true if you compare an object to itself.

uniq? uses the same trick that is used in hash tables, hash sets, etc. to speed up lookups:

  1. Compare the hashes. Computing a hash should normally be fast.
  2. If the hashes are identical, then, and only then double-check using eql?.

Upvotes: 6

Related Questions