tjrburgess
tjrburgess

Reputation: 787

ZRANGEBYSCORE in Redis and Ruby

I am trying to optimize ZRANGEBYSCORE in my Redis code in Ruby.

Specifically, the Redis website (http://redis.io/commands/zrangebyscore) states:

Time complexity: O(log(N)+M) with N being the number of elements in the sorted set and M the number of elements being returned. If M is constant (e.g. always asking for the first 10 elements with LIMIT), you can consider it O(log(N)).

So, as I read it, as long as I am using the limits, the big(O) should be constant at O(log(N) regardless of if the limit is set to 48 or 6. However, my benchmarking seems to suggest otherwise.

require 'redis'

def bench(descr)
    start = Time.now
    yield
    puts "#{descr} #{Time.now-start} seconds"
end

def with_pipelining_48
    id = 26053643
    @@redis.pipelined {
        1000.times {
            @@redis.zrevrangebyscore("key:#{id}", "+inf", "-inf", :limit => [0, 48],:with_scores => true)
        }
    }
end

def with_pipelining_24
    id = 26053643
    @@redis.pipelined {
        1000.times {
            @@redis.zrevrangebyscore("key:#{id}", "+inf", "-inf", :limit => [0, 24],:with_scores => true)
        }
    }
end

def with_pipelining_12
    id = 26053643
    @@redis.pipelined {
        1000.times {
            @@redis.zrevrangebyscore("key:#{id}", "+inf", "-inf", :limit => [0, 12],:with_scores => true)
        }
    }
end

def with_pipelining_6
    id = 26053643
    @@redis.pipelined {
        1000.times {
            @@redis.zrevrangebyscore("key:#{id}", "+inf", "-inf", :limit => [0, 6],:with_scores => true)
        }
    }
end

bench("with pipelining_48") {
    with_pipelining_48
}
bench("with pipelining_24") {
    with_pipelining_24
}
bench("with pipelining_12") {
    with_pipelining_12
}
bench("with pipelining_6") {
    with_pipelining_6
}

Returns the following results:

with pipelining_48 1.709097 seconds
with pipelining_24 0.930054 seconds
with pipelining_12 0.801045 seconds
with pipelining_6 0.633037 seconds

My results appear inconsistent with the Redis documentation. Can someone please shed some light on what may be causing the differences.

Upvotes: 2

Views: 480

Answers (2)

Eli
Eli

Reputation: 39009

So, first, for a true benchmark, you should take the average of your operations after throwing out outliers. The differences between all of these operations are in milliseconds, so if you just have a tiny network hiccup, or something strange happen on your box, it's going to throw the numbers off. If you do that, your numbers will probably look different.

Second, when you run an operation against Redis, there's a lot more happening than just the operation cost. A connection to Redis needs to be made along with other start-up costs associated. The data needs to be transferred as well, and there's going to be network back-and-forth. What you're timing is not just the cost of the operation, but all of these auxiliary costs as well. These are also going to be non-linear as different formats for holding the data may be used depending on data size. All in all, the main point is that zrevrangebyscore is O(log(n)+m), but there are auxiliary costs involved in the operation as well, and at low operation cost (which is almost always the case with Redis), those costs dwarf the actual costs of just the operation. This is a good thing.

Upvotes: 1

Itamar Haber
Itamar Haber

Reputation: 50112

I believe you misunderstood the meaning of Big(O). Big(O) is a complexity measure and, as Redis' documentation states, by keeping M constant you can discard its "contribution" to the complexity (basically, O(constant) = O(1)). That means that if you're comparing such operations, as long as M is kept the same, only N plays a role.

However, this does not mean that performance is the same. Returning 6 elements or 24 differs because you're moving different volumes of data (regardless of complexity).

Upvotes: 0

Related Questions