Reputation: 3575
Usually Arrays are implemented as a chunks of memory, sets as hash maps, and sorted sets as skip lists. Is that also the case in Ruby?
I'm trying to evaluate the use of different containers in Ruby in terms of performance and memory footprint
Upvotes: 6
Views: 2243
Reputation: 703
Previous answer didn't actually cover SortedSet performance.
So after reading the code and testing manually using class which counting every compare method call, here's the result. It tries to load 'rbtree' module and uses regular Set as fallback.
So 2 options:
From Ruby 2.0.0 set.rb:
def to_a
(@keys = @hash.keys).sort! unless @keys
@keys
end
Meaning that following code:
X.times{ sorted_set << num; sorted_set.first }
will be O(XNlog(N)), since ordering is approximate Nlog(N).
So basically using SortedSet without making sure that RBTree is available is confusing and inefficient, since it's not using the fact that it's sorted (binary search) when inserting. So using normal Set and array methods might be faster in this case.
Same example using regular Set:
X.times{ set << num; set.min }
will be O(XN)
Bottom line, if you need SortedSet work efficient, also add 'rbtree' to Gemfile.
Upvotes: 2
Reputation: 369488
Arrays are part of the Ruby core library. Each Ruby implementation has its own implementation of arrays. The Ruby Language Specification only specifies the behavior of Ruby arrays, it does not specify any particular implementation strategy. It doesn't even specify any performance constraints that would force or at least suggest a particular implementation strategy.
However, most Rubyists have some expectation about the performance characteristics of arrays that would force an implementation that doesn't meet them into obscurity, because nobody would actually use it:
This means that arrays need to be implemented as dynamic arrays with exponential resizing, otherwise those performance guarantees couldn't be met. You might get away with a very wide and shallow tree, but AFAIK no Ruby implementation does that.
Here's Rubinius's array implementation, which I personally find the easiest to read of all Ruby implementations. (Note: only the bare essentials are implemented in C++, most of the array methods are implemented in Ruby, e.g. in kernel/common/array.rb
).
Set
and SortedSet
are part of the set
library in the stdlib. The stdlib is shared mostly unchanged between Ruby implementations (at least the parts that are actually written in Ruby, obviously the parts written in other languages can't be shared), and since Set
is written in Ruby, you can expect it to the same on all Ruby implementations.
Set
is implemented as a Hash
, where only the keys are used, the values are simply always true
: see Set#add
in lib/set.rb
.
SortedSet
is backed by a Red-Black-Tree that isn't implemented in Ruby.
Upvotes: 8