Steve Katz
Steve Katz

Reputation: 105

Is there a built-in lazy Hash in Ruby?

I need to populate a Hash with various values. Some of values are accessed often enough and another ones really seldom.

The issue is, I'm using some computation to get values and populating the Hash becomes really slow with multiple keys.

Using some sort of cache is not a option in my case.

I wonder how to make the Hash compute the value only when the key is firstly accessed and not when it is added?

This way, seldom used values wont slow down the filling process.

I'm looking for something that is "kinda async" or lazy access.

Upvotes: 6

Views: 3583

Answers (4)

pje
pje

Reputation: 22727

This isn't strictly an answer to the body of your question, but Enumerable::Lazy will definitely be a part of Ruby 2.0. This will let you do lazy evaluation on iterator compositions:

lazy = [1, 2, 3].lazy.select(&:odd?)
# => #<Enumerable::Lazy: #<Enumerator::Generator:0x007fdf0b864c40>:each>
lazy.to_a 
# => [40, 50]

Upvotes: -1

Jonathan Tran
Jonathan Tran

Reputation: 15276

There are many different ways to approach this. I recommend using an instance of a class that you define instead of a Hash. For example, instead of...

# Example of slow code using regular Hash.
h = Hash.new
h[:foo] = some_long_computation
h[:bar] = another_long_computation
# Access value.
puts h[:foo]

... make your own class and define methods, like this...

class Config
  def foo
    some_long_computation
  end

  def bar
    another_long_computation
  end
end

config = Config.new
puts config.foo

If you want a simple way to cache the long computations or it absolutely must be a Hash, not your own class, you can now wrap the Config instance with a Hash.

config = Config.new
h = Hash.new {|h,k| h[k] = config.send(k) }
# Access foo.
puts h[:foo]
puts h[:foo]  # Not computed again. Cached from previous access.

One issue with the above example is that h.keys will not include :bar because you haven't accessed it yet. So you couldn't, for example, iterate over all the keys or entries in h because they don't exist until they're actually accessed. Another potential issue is that your keys need to be valid Ruby identifiers, so arbitrary String keys with spaces won't work when defining them on Config.

If this matters to you, there are different ways to handle it. One way you can do it is to populate your hash with thunks and force the thunks when accessed.

class HashWithThunkValues < Hash
  def [](key)
    val = super
    if val.respond_to?(:call)
      # Force the thunk to get actual value.
      val = val.call
      # Cache the actual value so we never run long computation again.
      self[key] = val
    end

    val
  end
end

h = HashWithThunkValues.new
# Populate hash.
h[:foo] = ->{ some_long_computation }
h[:bar] = ->{ another_long_computation }
h["invalid Ruby name"] = ->{ a_third_computation }  # Some key that's an invalid ruby identifier.
# Access hash.
puts h[:foo]
puts h[:foo]  # Not computed again. Cached from previous access.
puts h.keys  #=> [:foo, :bar, "invalid Ruby name"]

One caveat with this last example is that it won't work if your values are callable because it can't tell the difference between a thunk that needs to be forced and a value.

Again, there are ways to handle this. One way to do it would be to store a flag that marks whether a value has been evaluated. But this would require extra memory for every entry. A better way would be to define a new class to mark that a Hash value is an unevaluated thunk.

class Unevaluated < Proc
end

class HashWithThunkValues < Hash
  def [](key)
    val = super

    # Only call if it's unevaluated.
    if val.is_a?(Unevaluated)
      # Force the thunk to get actual value.
      val = val.call
      # Cache the actual value so we never run long computation again.
      self[key] = val
    end

    val
  end
end

# Now you must populate like so.
h = HashWithThunkValues.new
h[:foo] = Unevaluated.new { some_long_computation }
h[:bar] = Unevaluated.new { another_long_computation }
h["invalid Ruby name"] = Unevaluated.new { a_third_computation }  # Some key that's an invalid ruby identifier.
h[:some_proc] = Unevaluated.new { Proc.new {|x| x + 2 } }

The downside of this is that now you have to remember to use Unevaluted.new when populating your Hash. If you want all values to be lazy, you could override []= also. I don't think it would actually save much typing because you'd still need to use Proc.new, proc, lambda, or ->{} to create the block in the first place. But it might be worthwhile. If you did, it might look something like this.

class HashWithThunkValues < Hash
  def []=(key, val)
    super(key, val.respond_to?(:call) ? Unevaluated.new(&val) : val)
  end
end

So here is the full code.

class HashWithThunkValues < Hash

  # This can be scoped inside now since it's not used publicly.
  class Unevaluated < Proc
  end

  def [](key)
    val = super

    # Only call if it's unevaluated.
    if val.is_a?(Unevaluated)
      # Force the thunk to get actual value.
      val = val.call
      # Cache the actual value so we never run long computation again.
      self[key] = val
    end

    val
  end

  def []=(key, val)
    super(key, val.respond_to?(:call) ? Unevaluated.new(&val) : val)
  end

end

h = HashWithThunkValues.new
# Populate.
h[:foo] = ->{ some_long_computation }
h[:bar] = ->{ another_long_computation }
h["invalid Ruby name"] = ->{ a_third_computation }  # Some key that's an invalid ruby identifier.
h[:some_proc] = ->{ Proc.new {|x| x + 2 } }

Upvotes: 12

user904990
user904990

Reputation:

you can use this:

class LazyHash < Hash

  def [] key
    (_ = (@self||{})[key]) ? 
      ((self[key] = _.is_a?(Proc) ? _.call : _); @self.delete(key)) :
      super
  end

  def lazy_update key, &proc
    (@self ||= {})[key] = proc
    self[key] = proc
  end

end

Your lazy hash will behave exactly as a normal Hash, cause it is actually a real Hash.

See live demo here

*** UPDATE - answering to nested procs question ***

Yes, it would work, but it is cumbersome.

See updated answer.

Use lazy_update instead of []= to add "lazy" values to your hash.

Upvotes: 3

Roland Mai
Roland Mai

Reputation: 31077

You can define your own indexer with something like this:

class MyHash
  def initialize
    @cache = {}
  end

  def [](key)
    @cache[key] || (@cache[key] = compute(key))
  end

  def []=(key, value)
    @cache[key] = value
  end

  def compute(key)
    @cache[key] = 1
  end
end

and use it as follows:

1.9.3p286 :014 > hash = MyHash.new
 => #<MyHash:0x007fa0dd03a158 @cache={}> 

1.9.3p286 :019 > hash["test"]
 => 1 

1.9.3p286 :020 > hash
 => #<MyHash:0x007fa0dd03a158 @cache={"test"=>1}> 

Upvotes: 3

Related Questions