Reputation: 25983
We have a few places in a time-critical script where we convert old IDs into strings. At the moment, we use case statements inside a function, like so:
def get_name id
case id
when 1
"one thing"
when 3
"other thing"
else
"default thing"
end
end
I'm considering replacing this with a hash lookup, like so:
NAMES = {
1 => "one thing",
3 => "other thing",
}
NAMES.default = "default thing"
It feels like it ought to be faster to use NAMES[id]
than get_name(id)
- but is it?
Upvotes: 15
Views: 7713
Reputation: 1748
Here is an example that shows the case faster for symbol lookup. The previous examples were integer based keys.
https://gist.github.com/02c8f8ca0cd4c9d9ceb2
Upvotes: 1
Reputation: 47548
$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]
$ cat hash_vs_case.rb
require 'benchmark'
def get_from_case(id)
case id
when 1
"one thing"
when 3
"other thing"
else
"default thing"
end
end
NAMES = {
1 => "one thing",
3 => "other thing",
}
NAMES.default = "default thing"
def get_from_hash(arg)
NAMES[arg]
end
n = 1000000
Benchmark.bm do |test|
test.report("case 1") { n.times do; get_from_case(1); end }
test.report("hash 1") { n.times do; get_from_hash(1); end}
test.report("case 42") { n.times do; get_from_case(42); end }
test.report("hash 42") { n.times do; get_from_hash(42); end}
end
$ ruby -w hash_vs_case.rb
user system total real
case 1 0.330000 0.000000 0.330000 ( 0.422209)
hash 1 0.220000 0.000000 0.220000 ( 0.271300)
case 42 0.310000 0.000000 0.310000 ( 0.390295)
hash 42 0.320000 0.010000 0.330000 ( 0.402647)
And here is why you want to upgrade:
$ ruby -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
$ ruby -w hash_vs_case.rb
user system total real
case 1 1.380000 0.870000 2.250000 ( 2.738493)
hash 1 1.320000 0.850000 2.170000 ( 2.642013)
case 42 1.500000 0.960000 2.460000 ( 3.029923)
hash 42 1.890000 0.890000 2.780000 ( 3.456969)
Upvotes: 1
Reputation: 2706
A couple points, first. One is that low-level language constructs like this that more-or-less do the same thing are almost never the bottleneck in any real-world application, so it's (often) foolish to focus on them. Second, as has already been mentioned, if you're really concerned about it you should benchmark it. Ruby's benchmarking and profile tools are certainly not the most advanced in the programming ecosystem, but they get the job done.
My gut instinct is that hashes are going to be faster because (again, I'm guessing) the case statement must check each condition in turn (making finding the items O(n) instead of O(1)). But let's check!
Full benchmarking code is at https://gist.github.com/25 Basically, it generates a file that defines the appropriate case/hash and then uses them. I went ahead and put the hash lookup within a method call, too, so that overhead won't be a factor, but in real life there's no reason it should be stuck inside a method.
Here's what I get. In each case, I'm doing 10,000 lookups. Time is user-time in seconds
Case statement, 10 items 0.020000
Hash lookup, 10 items 0.010000
Case statement, 100 items 0.100000
Hash lookup, 100 items 0.010000
Case statement, 1000 items 0.990000
Hash lookup, 1000 items 0.010000
So, it looks very much like the case statement is O(n) (no shocker there). Also note that 10K lookups is still only a second even in the case statement, so unless you're doing a metric butload of these lookups, you're better off focusing on the rest of your code.
Upvotes: 33
Reputation: 14161
As Martin said, it depends on how many IDs you want to check. Are you picking the ID and corresponding strings from database or you want to hard-code it. If there is only a couple of checks, then you can go with CASE. But there is no option to modify the ID/String pair, if need arises.
On the other hand, if you are picking a lot of IDs from a database, you can use something like Dictionary to store name/value pairs.
While a dictionary is in the memory, lookup might be fast.
But in the end, it all depends on whether you are working with ever-changing ID/string pair or few constants only.
Upvotes: 0
Reputation: 29164
Since this depends on a number of factors (how many different IDs you want to convert, how intelligently the compiler can compile the case when
statemens), my advice would be: Measure it:
Write a small test routine and convert, say, 10.000.000 ids to strings. Do this a couple of times with either implementation and compare the results. If you have no significant difference, take whatever you like more (I think, the hash solution is a bit more elegant...)
Upvotes: 6
Reputation: 53319
Why not use a case
statement inline in the time-critical portion of your code, rather than making it its own function call? That avoids that overhead of a call stack, and also avoids the overhead of a hash lookup.
You can also always do a benchmark yourself. Do something like this:
t = Time.now
1_000_000.times { ...your code... }
secs = Time.now - t
Replacing "your code" with each option.
Upvotes: 0
Reputation: 35084
case when
has n complexity.
hash lookup
has ln(n) complexity, but using an additional object (hash) and calling its methods have its own penalty.
So if you have a lot of cases (thousands, millions, ...) hash lookup is better. But in your case, when you have only 3 variants, case when
of course will be much faster.
Upvotes: -1