wteuber
wteuber

Reputation: 1238

Why is Ruby's Array#count unreliable for large arrays?

I'm iterating over a combination of 2 elements of a fairly large array. When counting the elements of the combination, I found something odd. The following example shows what I mean:

[1] pry(main)> 10000.times.to_a.combination(2).count
=> 49995000   # correct

[2] pry(main)> 100000.times.to_a.combination(2).count
=> 704982704  # wrong, should be 4999950000

[3] pry(main)> count = 0; 100000.times.to_a.combination(2).each { count+=1 }; count
=> 4999950000 # correct

I double-checked the results using wolframalpha:

My question really is, why Array#count is not reliable in this case?

Also, see https://ruby-doc.org/core-2.2.0/Array.html#method-i-combination and https://ruby-doc.org/core-2.2.0/Array.html#method-i-count.

Thanks a lot.

Upvotes: 2

Views: 100

Answers (1)

cremno
cremno

Reputation: 4927

Array#count isn't the buggy method, Enumerator#count is:

100000.times.to_a.combination(2).class # => Enumerator

The good news is the bug you're experiencing was reported a few months ago as #14805 and subsequently fixed but the bad news is since then no new version of CRuby was released. So either wait for 2.5.2, 2.4.5, etc. or compile a version with the fix by yourself.

The issue is the counter of an enumerator was limited to a signed 32-bit integer, so an overflow occurred:

max = (1 << 31) - 1 # max value
4999950000 & max    # => 704982704

It was fixed by making the counter a bignum (Ruby's arbitrary precision integer type, since 2.4 an internal implementation detail) in case its value would be out of bounds of an int or as of now an unsigned long.

Upvotes: 8

Related Questions