EugeneMi
EugeneMi

Reputation: 3575

How are Arrays Sets and SortedSets implemented under the hood in Ruby

Usually Arrays are implemented as a chunks of memory, sets as hash maps, and sorted sets as skip lists. Is that also the case in Ruby?

I'm trying to evaluate the use of different containers in Ruby in terms of performance and memory footprint

Upvotes: 6

Views: 2243

Answers (2)

Kaplan Ilya
Kaplan Ilya

Reputation: 703

Previous answer didn't actually cover SortedSet performance.

  • Array - O(1) for add, delete and access element by index
  • Hash (uses hash table) - O(1) (larger 1 of course, than arrays) for add, delete, and access by key
  • Set - implemented by Hash so same
  • SortedSet:

So after reading the code and testing manually using class which counting every compare method call, here's the result. It tries to load 'rbtree' module and uses regular Set as fallback.

So 2 options:

  1. RBTree available, everything's fast and as expected, O(logN) on every addition, O(1) on getting first, O(N) on soeted_set.to_a.
  2. RBTree is not available. It uses Set (Hash underneath) and reorder if dirty on read.

From Ruby 2.0.0 set.rb:

def to_a
  (@keys = @hash.keys).sort! unless @keys
  @keys
end

Meaning that following code:

X.times{ sorted_set << num; sorted_set.first }

will be O(XNlog(N)), since ordering is approximate Nlog(N).

So basically using SortedSet without making sure that RBTree is available is confusing and inefficient, since it's not using the fact that it's sorted (binary search) when inserting. So using normal Set and array methods might be faster in this case.

Same example using regular Set:

X.times{ set << num; set.min }

will be O(XN)

Bottom line, if you need SortedSet work efficient, also add 'rbtree' to Gemfile.

Upvotes: 2

J&#246;rg W Mittag
J&#246;rg W Mittag

Reputation: 369488

Arrays are part of the Ruby core library. Each Ruby implementation has its own implementation of arrays. The Ruby Language Specification only specifies the behavior of Ruby arrays, it does not specify any particular implementation strategy. It doesn't even specify any performance constraints that would force or at least suggest a particular implementation strategy.

However, most Rubyists have some expectation about the performance characteristics of arrays that would force an implementation that doesn't meet them into obscurity, because nobody would actually use it:

  • inserting, prepending or appending as well as deleting an element has a worst-case step-complexity of O(n) and an amortized worst-case step-complexity of O(1)
  • accessing an element has a worst-case step-complexity of O(1)
  • iterating over all elements has a worst-case step-complexity of O(n)

This means that arrays need to be implemented as dynamic arrays with exponential resizing, otherwise those performance guarantees couldn't be met. You might get away with a very wide and shallow tree, but AFAIK no Ruby implementation does that.

Here's Rubinius's array implementation, which I personally find the easiest to read of all Ruby implementations. (Note: only the bare essentials are implemented in C++, most of the array methods are implemented in Ruby, e.g. in kernel/common/array.rb).

Set and SortedSet are part of the set library in the stdlib. The stdlib is shared mostly unchanged between Ruby implementations (at least the parts that are actually written in Ruby, obviously the parts written in other languages can't be shared), and since Set is written in Ruby, you can expect it to the same on all Ruby implementations.

Set is implemented as a Hash, where only the keys are used, the values are simply always true: see Set#add in lib/set.rb.

SortedSet is backed by a Red-Black-Tree that isn't implemented in Ruby.

Upvotes: 8

Related Questions