Reputation: 3811
i surprised that Ruby num >> 1
is slower than num/2
since i always thought that bit operators
should faster than normal operators.
# ruby 3.0.3
require "benchmark/ips"
Benchmark.ips do |bm|
num = 1000000
bm.report("num/2") do
num/2
end
bm.report("num >> 1") do
num >> 1
end
bm.compare!
end
# num/2: 16045584.8 i/s
# num >> 1: 14591335.3 i/s - 1.10x slower
Benchmark.ips do |bm|
num = 1000000
bm.report("num * 2") do
num * 2
end
bm.report("num << 1") do
num << 1
end
bm.compare!
end
# num * 2: 18252815.2 i/s
# num << 1: 14289700.6 i/s - 1.28x slower
Upvotes: 3
Views: 346
Reputation: 84423
Based on comments, I had failed to include the actual iterations in the testing. That has been corrected, and the code and results updated accordingly.
I didn't test with benchmark/ips as that is not a core gem or part of the Ruby standard library. However, based on some refactored tests using the standard Benchmark#bmbm method, it seems that some of your results may be skewed by the overhead of String assignment in some versions of Ruby, as I noticed slightly more variance across multiple runs when not using frozen strings. I also got slightly more consistent results without YJIT enabled in C Ruby, as that seems to add some variable amount of overhead even when its optimizations improve an application's overall speed. However, I didn't notice a material difference on Ruby 3.3.0 with or without frozen strings, and regardless of whether YJIT was enabled or not.
Consider the following, which uses frozen strings on Ruby 3.3.0:
# frozen_string_literal: true
require "benchmark"
NUM = 1_000_000
ITER = 100_000_000
Benchmark.bmbm do |x|
x.report("NUM / 2") do ITER.times { NUM / 2 } end
x.report("NUM >> 1") do ITER.times { NUM >> 1 } end
x.report("NUM * 2") do ITER.times { NUM * 2 } end
x.report("NUM << 1") do ITER.times { NUM << 1 } end
end
Even at 100 million iterations, on Ruby 3.3.0 with both YJIT and frozen strings enabled, I consistently get the following plus or minus a few milliseconds when iterating 100 million times using ruby --yjit shift_ops.rb
:
Rehearsal --------------------------------------------
NUM / 2 2.172535 0.000201 2.172736 ( 2.176693)
NUM >> 1 2.257154 0.000226 2.257380 ( 2.261687)
NUM * 2 2.078395 0.000204 2.078599 ( 2.081848)
NUM << 1 2.101787 0.000122 2.101909 ( 2.104296)
----------------------------------- total: 8.610624sec
user system total real
NUM / 2 2.153104 0.000117 2.153221 ( 2.156942)
NUM >> 1 2.296988 0.000197 2.297185 ( 2.301160)
NUM * 2 2.114520 0.000157 2.114677 ( 2.118208)
NUM << 1 2.121999 0.000158 2.122157 ( 2.125518)
100 million iterations consistently takes less than 3µs of wall clock time per iteration. That seems fast enough to me for any practical purpose. Worrying about iterations per second when total time for each method only varies by about 1.44 femtoseconds per iteration between the highest and lowest values seems unreasonable for any but the most rigorous use cases. For more typical use cases, that minor delta is likely immaterial. However, you can do better with a different engine!
If you really need faster performance out of the reference Ruby implementation, consider using threads, ractors, or a multi-core approach that optimizes for concurrency. The parallel gem can help with that on mainline Ruby. Other engines may provide additional alternatives.
Note that if you're running inside IRB instead of calling a file, you should start IRB as:
RUBYOPT="--enable=frozen_string_literal" irb --nocolorize
to ensure that frozen strings are enabled by default and to avoid ANSI colors from impacting performance.
It's worth noting that the code above ran faster on C Ruby than on JRuby or TruffleRuby when not looping. This is to be expected, since those engines optimize for different use cases and have higher overhead, but it is nevertheless worth noting that the reference implementation is currently the fastest when you don't need to iterate extensively!
If you truly need to run millions of operations, TruffleRuby in native mode is blazingly fast. The same 100 million iterations on TruffleRuby 24.0.0, run as ruby shift_ops.rb
without any other code changes or engine-specific optimizations, yields:
Rehearsal --------------------------------------------
NUM / 2 0.344032 0.009242 0.353274 ( 0.265151)
NUM >> 1 0.303473 0.002182 0.305655 ( 0.259223)
NUM * 2 0.372648 0.001641 0.374289 ( 0.337829)
NUM << 1 0.355533 0.001281 0.356814 ( 0.326600)
----------------------------------- total: 1.390032sec
user system total real
NUM / 2 0.353571 0.000581 0.354152 ( 0.245380)
NUM >> 1 0.337815 0.000339 0.338154 ( 0.253574)
NUM * 2 0.336717 0.000570 0.337287 ( 0.246457)
NUM << 1 0.351651 0.000632 0.352283 ( 0.246673)
That cuts the time for 100 million operations down to around 2.54µs per iteration, which is a 900% improvement in total time over a very large set of operations. On smaller sets, the improvement is significantly smaller, but someone will doubtless ding my math if I try to express the difference in the fractions of a millisecond involved.
In other words, you can probably gain the performance improvements you want simply by using a different Ruby engine to perform the calculations. This is definitely worth considering.
*
and /
MethodsI also found it notable that when testing with TruffleRuby across multiple runs or when increasing the number of iterations, the shift operators often beat the multiplication or division operators during rehearsal, but almost never after warmup. While the total time for both engines dropped during the post-rehearsal benchmarking run, over very large iterations the >>
method frequently took noticably longer than the /
method, while <<
showed a much smaller difference compared to *
. The same was true for C Ruby, where most runs of any type showed a similar pattern, but the deltas were always wider.
My educated guess, although you'd have to look at the underlying VM or C code to be sure, is that *
and /
are more frequently used methods in Ruby code. Unlike languages that specifically target systems programming, I would suspect that the multiplication and division methods have received more time, attention, and VM optimization than the binary shift methods. In particular, >>
may simply be a more expensive call for Ruby for whatever reason.
That said, if raw performance is your goal then you should use the most performant methods available based on benchmarking your real code. Regardless of the methods you decide to use, these results seem more than fast enough for most practical purposes, but does highlight that the issue is likely in the underlying VM or that engine's implementation of binary shifts rather than in the Ruby language itself.
If you think it's a bug as opposed to a language or implementation design decision then you can always report it on the Ruby issue tracker. However, from a purely pragmatic point of view, all options are likely fast enough for most practical use cases. In addition, you always have the option of writing your own native C extensions or calling out from Ruby to a purpose-built tool for number-crunching applications that really do need to perform at a higher rate of instructions per second.
Upvotes: 1