Reputation: 104082
I have:
$ ruby -v
ruby 2.0.0p648 (2015-12-16 revision 53162) [universal.x86_64-darwin16]
Suppose you have a sequence of integers 1..n
a ruby novice would sum the sequence like so:
$ ruby -e 's=0
for i in 1..500000
s+=i
end
puts s'
125000250000
Now suppose that I have the same sequence that comes from the stdin
:
$ seq 1 500000 | ruby -lne 'BEGIN{s=0}
s+=$_.to_i
END{puts s} '
125000250000
So far so good.
Now increase the terminal value from 500,000 to 5,000,000:
$ ruby -e 's=0
for i in 1..5000000
s+=i
end
puts s'
12500002500000 <=== CORRECT
$ seq 1 5000000 | ruby -lne 'BEGIN{s=0}
s+=$_.to_i
END{puts s} '
500009500025 <=== WRONG!
It produces a different sum.
awk
and perl
both produce the correct result with the same sequence:
$ seq 1 5000000 | awk '{s+=$1} END{print s}'
12500002500000
$ seq 1 5000000 | perl -nle '$s+=$_; END{print $s}'
12500002500000
Why is ruby producing the incorrect sum? I don't think it is overflow since awk
and perl
are working correctly on the same input.
Conclusions:
Thank you David Aldridge for diagnosing this.
OS X and BSD seq
converts to a float output at 1,000,000 while GNU seq
supports arbitrary precision integers. OS X seq
is useless as a source of integers greater than 1,000,000. Example on OS X:
$ seq 999999 1000002
999999
1e+06
1e+06
1e+06
The ruby method .to_i
silently converts a partial string to an integer and that was the 'bug' in this case. Example:
irb(main):002:0> '5e+06'.to_i
#=> 5
The 'correct' line in the script is to either use $_.to_f.to_i
to use floats or to use Integer($_)
to not have the script fail silently. awk
and perl
parse 5e+06 into a float, and ruby
does not implicitly:
$ echo '5e+06' | awk '{print $1+0}'
5000000
$ echo '5e+06' | ruby -lne 'print $_.to_i+0'
5
And thanks to Stefan Schüßler for opening a Ruby feature request regarding .to_i
behavior.
Upvotes: 3
Views: 267
Reputation: 114237
To explain the e-notation output, the OS X man page for seq
gives some insight:
Use a printf(3) style format to print each number. [...] The default is
%g
.
Therefore, seq
's output is equivalent to Ruby's:
sprintf('%g', 100000)
#=> "100000"
sprintf('%g', 1000000)
#=> "1e+06"
Upvotes: 1
Reputation: 52376
I'm not sure that this is a 100% answer, but I notice that:
seq 500000 500001 | ruby -lne 'BEGIN{}
puts $_
END{} '
500000
500001
... but ...
seq 5000000 5000001 | ruby -lne 'BEGIN{}
puts $_
END{} '
5e+06
5e+06
... so the "relaxed" approach that #to_i takes to converting the values to integers will still work ...
seq 5000000 5000001 | ruby -lne 'BEGIN{}
puts $_.to_i
END{} '
5
5
... but the more strict #to_int will not
seq 5000000 5000001 | ruby -lne 'BEGIN{}
puts $_.to_int
END{} '
-e:2:in `<main>': undefined method `to_int' for "5e+06":String (NoMethodError)
Edit: I also notice:
seq 5000000 5000001
5e+06
5e+06
So an -f
flag has to be passed to seq to get integer format.
Edit again:
final answer:
seq -f %f 1 5000000 | ruby -lne 'BEGIN{s=0}
s+=$_.to_i
END{puts s} '
12500002500000
Upvotes: 5