Why is Ruby's Array#count unreliable for large arrays?

Question

I'm iterating over a combination of 2 elements of a fairly large array. When counting the elements of the combination, I found something odd. The following example shows what I mean:

[1] pry(main)> 10000.times.to_a.combination(2).count
=> 49995000   # correct

[2] pry(main)> 100000.times.to_a.combination(2).count
=> 704982704  # wrong, should be 4999950000

[3] pry(main)> count = 0; 100000.times.to_a.combination(2).each { count+=1 }; count
=> 4999950000 # correct

I double-checked the results using wolframalpha:

pairs from 10000 entries https://www.wolframalpha.com/input/?i=10000!%2F((10000-2)!*2!)
pairs from 100000 entries https://www.wolframalpha.com/input/?i=100000!%2F((100000-2)!*2!)

My question really is, why Array#count is not reliable in this case?

Also, see https://ruby-doc.org/core-2.2.0/Array.html#method-i-combination and https://ruby-doc.org/core-2.2.0/Array.html#method-i-count.

Thanks a lot.

cremno · Accepted Answer

Array#count isn't the buggy method, Enumerator#count is:

100000.times.to_a.combination(2).class # => Enumerator

The good news is the bug you're experiencing was reported a few months ago as #14805 and subsequently fixed but the bad news is since then no new version of CRuby was released. So either wait for 2.5.2, 2.4.5, etc. or compile a version with the fix by yourself.

The issue is the counter of an enumerator was limited to a signed 32-bit integer, so an overflow occurred:

max = (1 << 31) - 1 # max value
4999950000 & max    # => 704982704

It was fixed by making the counter a bignum (Ruby's arbitrary precision integer type, since 2.4 an internal implementation detail) in case its value would be out of bounds of an int or as of now an unsigned long.

Why is Ruby's Array#count unreliable for large arrays?

Answers (1)

Related Questions

Why is Ruby&#39;s Array#count unreliable for large arrays?

Answers (1)

Related Questions

Why is Ruby's Array#count unreliable for large arrays?