Reputation: 2450
I'm interested in implementing a custom equality method for use in an array of objects in Ruby. Here's a stripped-back example:
class Foo
attr_accessor :a, :b
def initialize(a, b)
@a = a
@b = b
end
def ==(other)
puts 'doing comparison'
@a == @a && @b == @b
end
def to_s
"#{@a}: #{@b}"
end
end
a = [
Foo.new(1, 1),
Foo.new(1, 2),
Foo.new(2, 1),
Foo.new(2, 2),
Foo.new(2, 2)
]
a.uniq
I expected the uniq method to call Foo#==
, and remove the last instance of Foo. Instead, I don't see the 'doing comparison' debug line and the array remains the same length.
Notes:
===
a.uniq{|x| [x.a, x.b]}
, but I don't like this solution it's making the code look pretty cluttered.Upvotes: 7
Views: 1535
Reputation: 23327
It compares values using their hash and eql? methods for efficiency.
https://ruby-doc.org/core-2.5.0/Array.html#method-i-uniq-3F
So you should override eql?
(that is ==
) and hash
UPDATE:
I cannot explain fully why is that, but overriding hash
and ==
doesn't work. I guess it's cause by the way uniq
is implemented in C:
From: array.c (C Method): Owner: Array Visibility: public Number of lines: 20
static VALUE
rb_ary_uniq(VALUE ary)
{
VALUE hash, uniq;
if (RARRAY_LEN(ary) <= 1)
return rb_ary_dup(ary);
if (rb_block_given_p()) {
hash = ary_make_hash_by(ary);
uniq = rb_hash_values(hash);
}
else {
hash = ary_make_hash(ary);
uniq = rb_hash_values(hash);
}
RBASIC_SET_CLASS(uniq, rb_obj_class(ary));
ary_recycle_hash(hash);
return uniq;
}
You can bypass that by using a block version of uniq:
> [Foo.new(1,2), Foo.new(1,2), Foo.new(2,3)].uniq{|f| [f.a, f.b]}
=> [#<Foo:0x0000562e48937cc8 @a=1, @b=2>, #<Foo:0x0000562e48937c78 @a=2, @b=3>]
Or use Struct
instead:
F = Struct.new(:a, :b)
[F.new(1,2), F.new(1,2), F.new(2,3)].uniq
# => [#<struct F a=1, b=2>, #<struct F a=2, b=3>]
UPDATE2:
Actually in terms of overriding it's not the same if you override ==
or eql?
. When I overriden eql?
It worked as intended:
class Foo
attr_accessor :a, :b
def initialize(a, b)
@a = a
@b = b
end
def eql?(other)
(@a == other.a && @b == other.b)
end
def hash
[a, b].hash
end
def to_s
"#{@a}: #{@b}"
end
end
a = [
Foo.new(1, 1),
Foo.new(1, 2),
Foo.new(2, 1),
Foo.new(2, 2),
Foo.new(2, 2)
]
a.uniq
#=> [#<Foo:0x0000562e483bff70 @a=1, @b=1>,
#<Foo:0x0000562e483bff48 @a=1, @b=2>,
#<Foo:0x0000562e483bff20 @a=2, @b=1>,
#<Foo:0x0000562e483bfef8 @a=2, @b=2>]
Upvotes: 10
Reputation: 369468
You can find the answer in the documentation of Array#uniq
(for some reason, it is not mentioned in the documentation of Enumerable#uniq
):
It compares values using their
hash
andeql?
methods for efficiency.
The contracts of hash
and eql?
are as follows:
hash
returns an Integer
that must be the same for objects which are considered equal, but does not necessarily have to be different for objects that are not equal. This means that different hashes mean that the objects are definitely not equal, but the same hash doesn't tell you anything. Ideally, hash
should also be resistant to accidental and deliberate collisions.eql?
is value equality, usually stricter than ==
but less strict than equal?
which is more or less identity: equal?
should only return true
if you compare an object to itself.uniq?
uses the same trick that is used in hash tables, hash sets, etc. to speed up lookups:
eql?
.Upvotes: 6