Reputation: 213

Ruby: Differences in object identity of Integers

Why is 1.equal? 1 == true but (2**100).equal? (2**100) == false?

And has this to be considered as a bug?

Upvotes: 2

Answers (4)

dawg

Reputation: 104062

Ruby used to have separate classes Fixnum and Bignum that were unified into the Integer class in Ruby 2.4. However, the concept remains that smaller integers are stored differently than larger integers (62 bits or greater on C Ruby.)

The 64 bit MRI / YARV version of Ruby has a three tier object storage model:

An 8-byte node that directly encodes TINY or IMMEDIATE objects directly inside it, OR is a pointer to...
A 40-byte RVALUE structure, otherwise known as a slot, which can fully contain a SMALL object as an IMMEDIATE value OR is the starting 40 bytes (data and pointer) of…
Something bigger, which uses the RVALUE data for initial part of the object and a pointer to heap memory block from malloc appropriate for the size for the object.

What can fit into a TINY node as immediate values? Floats, Boolean values, Short symbols, 62 bit or smaller signed integers; otherwise a pointer to a larger object.

What can fit into a RVALUE as RVALUE IMMEDIATE value? Smallish Bignum type integers greater than 62 bits, short strings, longer symbols; or a pointer to the rest of the object held in heap allocated from the OS. Larger Bignum type ints may require OS memory allocation to hold.

Using ObjectSpace, you can see the crossover of memory usage:

irb(main):080> require 'objspace'
=> false
irb(main):081> ObjectSpace.memsize_of(2**62-1)  # immediate value
=> 0
irb(main):082> ObjectSpace.memsize_of(2**62)    # RVALUE IMMEDIATE
=> 40
irb(main):083> ObjectSpace.memsize_of(2**100)   # RVALUE IMMEDIATE
=> 40
irb(main):084> ObjectSpace.memsize_of(2**1000)  # RVALUE POINTER
=> 168

The .equal? method is comparing not only mathematical equivalence but also object equivalence. It will be false unless the objects themselves are the same object. The only way the object is the same for two different math operations is for Ruby to realize I have seen this before and point to the object previously used. This is known as interning.

Observationally, it would appear that any Integer object that is not in a TINY node as an immediate object is treated as a different object -- regardless if it is RVALUE type or an object that the RVALUE is pointing to. Since TINY objects are essentially tagged machine words, these are easier to track. It is likely not efficient to try and catalog every larger Integer to intern those objects.

As the reasoning held with Bignum vs Fixnum that caused those classes to be hidden, the use of .equal? with Integers is simply not useful. While it seems that it works for smaller values, (2**(99-89)).equal? 2**(1+8) shows that there is something that looks like interning, it is not guaranteed behavior. It is only observed behavior that could change at any time.

There is a prohibition on Singleton Methods for instances of the Integer class. That would lead one to believe that the interning behavior is the reason -- Maybe. It leads me to believe that the interning behavior seen with 62 bit integer types may become the promised Integer behavior in future. So far, that is not true.

Use eql? or == to compare two integers.

Upvotes: 4

Alex

Reputation: 30023

Not a bug, just a limit of a 64-bit memory. Once you go past 64-bit limit ruby has to do something else to allocate an integer, it cannot be an "immediate" value anymore:

>> 1.object_id.to_s(2)  # object id in binary
=> "11"                 # last bit is reserved and is always a 1

>> (2**62 - 1).object_id.to_s(2)
=> "111111111111111111111111111111111111111111111111111111111111111" # 63 bits

We get a 63 bit object id which directly corresponds to a memory address, one more bit is reserved for the - sign. Past this point ruby has to fallback to normal object allocation for integers. You can only rely on integers being equal? (same object) upto this limit, in general, you should use eql? (same value and type) or == (same value).

Integers are encoded by shifting the number left one, then setting the last bit to 1. So the number 1 will be encoded as 3 (or in binary 11), and the number 40 will be encoded as 81 (or 1010001 in binary). If the pointer we’re dealing with has a 1 in the last bit, we know it’s a Ruby integer, and we can decode it (convert to a C integer) by shifting right 1.

https://tenderlovemaking.com/2017/02/01/object-id-in-mri/

Upvotes: 2

user513951

Reputation: 13705

Integers that fit into the space of the old and deprecated Fixnum are part of the category of objects known as "immediate values":

Fixnum, true, nil, and false are implemented as immediate values. With immediate values, variables hold the objects themselves, rather than references to them.

Singleton methods cannot be defined for such objects. Two Fixnums of the same value always represent the same object instance, so (for example) instance variables for the Fixnum with the value 1 are shared between all the 1’s in the system. This makes it impossible to define a singleton method for just one of these.

I would encourage you to consider that the real fix is not to change the behavior of equal?, but rather to change the name. The system makes perfect sense if you rename equal? to same_object? and ==/eql? to same_value?.

Upvotes: 2

Tom Lord

Reputation: 28305

On my version of ruby seems to be the cutoff:

(2**62 - 1).equal?(2**62 - 1) #=> true
(2**62).equal?(2**62) #=> false

I expect this varies between ruby implementations/installations.

The reason for this is that .equal? is comparing object identity - which is a stricter check for equality than the standard "value identity" of ==. For example:

:test.equal?(:test) #=> true
# Because :test.object_id == :test.object_id
# i.e. It's exactly the same object every time

"test".equal?("test") #=> false
# Because "test".object_id != "test".object_id
# i.e. It's a different object each time. Note that strings in ruby are mutable!

And going back to the example of integers:

0.object_id == 1
1.object_id == 3
2.object_id == 5
# ...
(2**62 - 1).object_id == 9223372036854775807
(2**62).object_id == 300 # Or whatever. A different number each time!

... So beyond a certain limit, it seems ruby is assigning "custom object IDs" to the value, rather than it being a well-defined constant, or a re-used reference like it does with Symbols.

Is this a bug? Not really; ruby never promises something like "all integers have a fixed object_id", like it does with Symbols. Admittedly it's a slight surprise/quirk of the language, but then again, I don't think Ruby even promises that 1.equal?(1) must return true! It doesn't work for plenty of other objects, e.g. /foo/.equal?(/foo/) #=> false.

Lastly, I'm unsure why you'd be using .equal? to compare integers in the first place, rather than ==. I'm guessing this was posted purely an academic bug/question, because I don't expect "real" code would run into this problem at all.

Upvotes: 3

Ruby: Differences in object identity of Integers

Answers (4)

Related Questions