Reputation: 105
I need to populate a Hash with various values. Some of values are accessed often enough and another ones really seldom.
The issue is, I'm using some computation to get values and populating the Hash becomes really slow with multiple keys.
Using some sort of cache is not a option in my case.
I wonder how to make the Hash compute the value only when the key is firstly accessed and not when it is added?
This way, seldom used values wont slow down the filling process.
I'm looking for something that is "kinda async" or lazy access.
Upvotes: 6
Views: 3583
Reputation: 22727
This isn't strictly an answer to the body of your question, but Enumerable::Lazy
will definitely be a part of Ruby 2.0. This will let you do lazy evaluation on iterator compositions:
lazy = [1, 2, 3].lazy.select(&:odd?)
# => #<Enumerable::Lazy: #<Enumerator::Generator:0x007fdf0b864c40>:each>
lazy.to_a
# => [40, 50]
Upvotes: -1
Reputation: 15276
There are many different ways to approach this. I recommend using an instance of a class that you define instead of a Hash. For example, instead of...
# Example of slow code using regular Hash.
h = Hash.new
h[:foo] = some_long_computation
h[:bar] = another_long_computation
# Access value.
puts h[:foo]
... make your own class and define methods, like this...
class Config
def foo
some_long_computation
end
def bar
another_long_computation
end
end
config = Config.new
puts config.foo
If you want a simple way to cache the long computations or it absolutely must be a Hash, not your own class, you can now wrap the Config instance with a Hash.
config = Config.new
h = Hash.new {|h,k| h[k] = config.send(k) }
# Access foo.
puts h[:foo]
puts h[:foo] # Not computed again. Cached from previous access.
One issue with the above example is that h.keys
will not include :bar
because you haven't accessed it yet. So you couldn't, for example, iterate over all the keys or entries in h
because they don't exist until they're actually accessed. Another potential issue is that your keys need to be valid Ruby identifiers, so arbitrary String keys with spaces won't work when defining them on Config
.
If this matters to you, there are different ways to handle it. One way you can do it is to populate your hash with thunks and force the thunks when accessed.
class HashWithThunkValues < Hash
def [](key)
val = super
if val.respond_to?(:call)
# Force the thunk to get actual value.
val = val.call
# Cache the actual value so we never run long computation again.
self[key] = val
end
val
end
end
h = HashWithThunkValues.new
# Populate hash.
h[:foo] = ->{ some_long_computation }
h[:bar] = ->{ another_long_computation }
h["invalid Ruby name"] = ->{ a_third_computation } # Some key that's an invalid ruby identifier.
# Access hash.
puts h[:foo]
puts h[:foo] # Not computed again. Cached from previous access.
puts h.keys #=> [:foo, :bar, "invalid Ruby name"]
One caveat with this last example is that it won't work if your values are callable because it can't tell the difference between a thunk that needs to be forced and a value.
Again, there are ways to handle this. One way to do it would be to store a flag that marks whether a value has been evaluated. But this would require extra memory for every entry. A better way would be to define a new class to mark that a Hash value is an unevaluated thunk.
class Unevaluated < Proc
end
class HashWithThunkValues < Hash
def [](key)
val = super
# Only call if it's unevaluated.
if val.is_a?(Unevaluated)
# Force the thunk to get actual value.
val = val.call
# Cache the actual value so we never run long computation again.
self[key] = val
end
val
end
end
# Now you must populate like so.
h = HashWithThunkValues.new
h[:foo] = Unevaluated.new { some_long_computation }
h[:bar] = Unevaluated.new { another_long_computation }
h["invalid Ruby name"] = Unevaluated.new { a_third_computation } # Some key that's an invalid ruby identifier.
h[:some_proc] = Unevaluated.new { Proc.new {|x| x + 2 } }
The downside of this is that now you have to remember to use Unevaluted.new
when populating your Hash. If you want all values to be lazy, you could override []=
also. I don't think it would actually save much typing because you'd still need to use Proc.new
, proc
, lambda
, or ->{}
to create the block in the first place. But it might be worthwhile. If you did, it might look something like this.
class HashWithThunkValues < Hash
def []=(key, val)
super(key, val.respond_to?(:call) ? Unevaluated.new(&val) : val)
end
end
So here is the full code.
class HashWithThunkValues < Hash
# This can be scoped inside now since it's not used publicly.
class Unevaluated < Proc
end
def [](key)
val = super
# Only call if it's unevaluated.
if val.is_a?(Unevaluated)
# Force the thunk to get actual value.
val = val.call
# Cache the actual value so we never run long computation again.
self[key] = val
end
val
end
def []=(key, val)
super(key, val.respond_to?(:call) ? Unevaluated.new(&val) : val)
end
end
h = HashWithThunkValues.new
# Populate.
h[:foo] = ->{ some_long_computation }
h[:bar] = ->{ another_long_computation }
h["invalid Ruby name"] = ->{ a_third_computation } # Some key that's an invalid ruby identifier.
h[:some_proc] = ->{ Proc.new {|x| x + 2 } }
Upvotes: 12
Reputation:
you can use this:
class LazyHash < Hash
def [] key
(_ = (@self||{})[key]) ?
((self[key] = _.is_a?(Proc) ? _.call : _); @self.delete(key)) :
super
end
def lazy_update key, &proc
(@self ||= {})[key] = proc
self[key] = proc
end
end
Your lazy hash will behave exactly as a normal Hash
, cause it is actually a real Hash
.
*** UPDATE - answering to nested procs question ***
Yes, it would work, but it is cumbersome.
See updated answer.
Use lazy_update
instead of []
= to add "lazy" values to your hash.
Upvotes: 3
Reputation: 31077
You can define your own indexer with something like this:
class MyHash
def initialize
@cache = {}
end
def [](key)
@cache[key] || (@cache[key] = compute(key))
end
def []=(key, value)
@cache[key] = value
end
def compute(key)
@cache[key] = 1
end
end
and use it as follows:
1.9.3p286 :014 > hash = MyHash.new
=> #<MyHash:0x007fa0dd03a158 @cache={}>
1.9.3p286 :019 > hash["test"]
=> 1
1.9.3p286 :020 > hash
=> #<MyHash:0x007fa0dd03a158 @cache={"test"=>1}>
Upvotes: 3