dawg
dawg

Reputation: 104082

Ruby sum stdin integers

I have:

$ ruby -v
ruby 2.0.0p648 (2015-12-16 revision 53162) [universal.x86_64-darwin16]

Suppose you have a sequence of integers 1..n a ruby novice would sum the sequence like so:

$ ruby -e 's=0
     for i in 1..500000
        s+=i
     end
     puts s'
125000250000

Now suppose that I have the same sequence that comes from the stdin:

$ seq 1 500000 | ruby -lne 'BEGIN{s=0}
                            s+=$_.to_i
                            END{puts s} '   
125000250000

So far so good.

Now increase the terminal value from 500,000 to 5,000,000:

$ ruby -e 's=0
         for i in 1..5000000
            s+=i
         end
         puts s'
12500002500000   <=== CORRECT

$ seq 1 5000000 | ruby -lne 'BEGIN{s=0}
                             s+=$_.to_i
                             END{puts s} '
500009500025     <=== WRONG!

It produces a different sum.

awk and perl both produce the correct result with the same sequence:

$ seq 1 5000000 | awk '{s+=$1} END{print s}'
12500002500000
$ seq 1 5000000 | perl -nle '$s+=$_; END{print $s}'
12500002500000

Why is ruby producing the incorrect sum? I don't think it is overflow since awk and perl are working correctly on the same input.


Conclusions:

Thank you David Aldridge for diagnosing this.

  1. OS X and BSD seq converts to a float output at 1,000,000 while GNU seq supports arbitrary precision integers. OS X seq is useless as a source of integers greater than 1,000,000. Example on OS X:

    $ seq  999999 1000002
    999999
    1e+06
    1e+06
    1e+06
    
  2. The ruby method .to_i silently converts a partial string to an integer and that was the 'bug' in this case. Example:

    irb(main):002:0> '5e+06'.to_i
    #=> 5
    
  3. The 'correct' line in the script is to either use $_.to_f.to_i to use floats or to use Integer($_) to not have the script fail silently. awk and perl parse 5e+06 into a float, and ruby does not implicitly:

    $ echo '5e+06' | awk '{print $1+0}'
    5000000
    $ echo '5e+06' | ruby -lne 'print $_.to_i+0'
    5
    
  4. And thanks to Stefan Schüßler for opening a Ruby feature request regarding .to_i behavior.

Upvotes: 3

Views: 267

Answers (2)

Stefan
Stefan

Reputation: 114237

To explain the e-notation output, the OS X man page for seq gives some insight:

Use a printf(3) style format to print each number. [...] The default is %g.

Therefore, seq's output is equivalent to Ruby's:

sprintf('%g', 100000)
#=> "100000"

sprintf('%g', 1000000)
#=> "1e+06"

Upvotes: 1

David Aldridge
David Aldridge

Reputation: 52376

I'm not sure that this is a 100% answer, but I notice that:

seq 500000 500001 | ruby -lne 'BEGIN{}
                             puts $_
                             END{} '
500000
500001

... but ...

seq 5000000 5000001 | ruby -lne 'BEGIN{}
                             puts $_
                             END{} '
5e+06
5e+06

... so the "relaxed" approach that #to_i takes to converting the values to integers will still work ...

seq 5000000 5000001 | ruby -lne 'BEGIN{}
                             puts $_.to_i
                             END{} '
5
5

... but the more strict #to_int will not

seq 5000000 5000001 | ruby -lne 'BEGIN{}
                             puts $_.to_int
                             END{} '
-e:2:in `<main>': undefined method `to_int' for "5e+06":String (NoMethodError)

Edit: I also notice:

seq 5000000 5000001

5e+06
5e+06

So an -f flag has to be passed to seq to get integer format.

Edit again:

final answer:

seq -f %f 1 5000000 | ruby -lne 'BEGIN{s=0}
                                  s+=$_.to_i
                                 END{puts s} '

12500002500000

Upvotes: 5

Related Questions