Simon
Simon

Reputation: 25983

Which is faster in ruby - a hash lookup or a function with a case statement?

We have a few places in a time-critical script where we convert old IDs into strings. At the moment, we use case statements inside a function, like so:

def get_name id
  case id
    when 1
      "one thing"
    when 3
      "other thing"
    else
      "default thing"
  end
end

I'm considering replacing this with a hash lookup, like so:

NAMES = {
  1 => "one thing",
  3 => "other thing",
}
NAMES.default = "default thing"

It feels like it ought to be faster to use NAMES[id] than get_name(id) - but is it?

Upvotes: 15

Views: 7713

Answers (7)

Amala
Amala

Reputation: 1748

Here is an example that shows the case faster for symbol lookup. The previous examples were integer based keys.

https://gist.github.com/02c8f8ca0cd4c9d9ceb2

Upvotes: 1

zetetic
zetetic

Reputation: 47548

$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]

$ cat hash_vs_case.rb 
require 'benchmark'

def get_from_case(id)
  case id
    when 1
      "one thing"
    when 3
      "other thing"
    else
      "default thing"
  end
end

NAMES = {
  1 => "one thing",
  3 => "other thing",
}
NAMES.default = "default thing"

def get_from_hash(arg)
  NAMES[arg]
end

n = 1000000
Benchmark.bm do |test|
  test.report("case  1") { n.times do; get_from_case(1); end }
  test.report("hash  1") { n.times do; get_from_hash(1); end}
  test.report("case 42") { n.times do; get_from_case(42); end }
  test.report("hash 42") { n.times do; get_from_hash(42); end}
end

$ ruby -w hash_vs_case.rb 
      user     system      total        real
case  1  0.330000   0.000000   0.330000 (  0.422209)
hash  1  0.220000   0.000000   0.220000 (  0.271300)
case 42  0.310000   0.000000   0.310000 (  0.390295)
hash 42  0.320000   0.010000   0.330000 (  0.402647)

And here is why you want to upgrade:

$ ruby -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]

$ ruby -w hash_vs_case.rb 
      user     system      total        real
case  1  1.380000   0.870000   2.250000 (  2.738493)
hash  1  1.320000   0.850000   2.170000 (  2.642013)
case 42  1.500000   0.960000   2.460000 (  3.029923)
hash 42  1.890000   0.890000   2.780000 (  3.456969)

Upvotes: 1

Bill Dueber
Bill Dueber

Reputation: 2706

A couple points, first. One is that low-level language constructs like this that more-or-less do the same thing are almost never the bottleneck in any real-world application, so it's (often) foolish to focus on them. Second, as has already been mentioned, if you're really concerned about it you should benchmark it. Ruby's benchmarking and profile tools are certainly not the most advanced in the programming ecosystem, but they get the job done.

My gut instinct is that hashes are going to be faster because (again, I'm guessing) the case statement must check each condition in turn (making finding the items O(n) instead of O(1)). But let's check!

Full benchmarking code is at https://gist.github.com/25 Basically, it generates a file that defines the appropriate case/hash and then uses them. I went ahead and put the hash lookup within a method call, too, so that overhead won't be a factor, but in real life there's no reason it should be stuck inside a method.

Here's what I get. In each case, I'm doing 10,000 lookups. Time is user-time in seconds

Case statement, 10 items  0.020000
Hash lookup, 10 items     0.010000

Case statement, 100 items  0.100000
Hash lookup, 100 items     0.010000

Case statement, 1000 items  0.990000
Hash lookup, 1000 items     0.010000

So, it looks very much like the case statement is O(n) (no shocker there). Also note that 10K lookups is still only a second even in the case statement, so unless you're doing a metric butload of these lookups, you're better off focusing on the rest of your code.

Upvotes: 33

RKh
RKh

Reputation: 14161

As Martin said, it depends on how many IDs you want to check. Are you picking the ID and corresponding strings from database or you want to hard-code it. If there is only a couple of checks, then you can go with CASE. But there is no option to modify the ID/String pair, if need arises.

On the other hand, if you are picking a lot of IDs from a database, you can use something like Dictionary to store name/value pairs.

While a dictionary is in the memory, lookup might be fast.

But in the end, it all depends on whether you are working with ever-changing ID/string pair or few constants only.

Upvotes: 0

MartinStettner
MartinStettner

Reputation: 29164

Since this depends on a number of factors (how many different IDs you want to convert, how intelligently the compiler can compile the case when statemens), my advice would be: Measure it:

Write a small test routine and convert, say, 10.000.000 ids to strings. Do this a couple of times with either implementation and compare the results. If you have no significant difference, take whatever you like more (I think, the hash solution is a bit more elegant...)

Upvotes: 6

Ben Lee
Ben Lee

Reputation: 53319

Why not use a case statement inline in the time-critical portion of your code, rather than making it its own function call? That avoids that overhead of a call stack, and also avoids the overhead of a hash lookup.

You can also always do a benchmark yourself. Do something like this:

t = Time.now
1_000_000.times { ...your code... }
secs = Time.now - t

Replacing "your code" with each option.

Upvotes: 0

Nakilon
Nakilon

Reputation: 35084

case when has n complexity.
hash lookup has ln(n) complexity, but using an additional object (hash) and calling its methods have its own penalty.

So if you have a lot of cases (thousands, millions, ...) hash lookup is better. But in your case, when you have only 3 variants, case when of course will be much faster.

Upvotes: -1

Related Questions