How can bisect_right() be 4x slower than insort_right()?

Question

I'm trying to optimize a solution that times out on a competitive programming website. I started using cProfile and it appears to show bisect_right() requiring 4x as much time as insort_right(). This is a profiling run on input list with over 40k entries:

         89936 function calls in 3.596 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.596    3.596 :1()
        1    0.180    0.180    3.596    3.596 leetcode.py:38(go)
        1    3.357    3.357    3.415    3.415 leetcode.py:6(numSubarrayProductLessThanK)
    44965    0.045    0.000    0.045    0.000 {built-in method _bisect.bisect_right}
    44965    0.014    0.000    0.014    0.000 {built-in method _bisect.insort_right}
        1    0.000    0.000    3.596    3.596 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

I thought all list insertions were O(n) since on average n/2 elements must be moved. And simply determining the location of insertion into sorted list would be O(log n). But in the profile report it looks flipped: the inserter insort_right() is slower than the location determiner bisect_right(). Where am I going wrong? Here is the code:

from bisect import insort, bisect

class Solution:
    def numSubarrayProductLessThanK(self, nums, k):
        if k == 0:
            return 0

        result = 0
        product = 1 
        products = [1]
        products_n = 1

        for num in nums:
            product *= num

            factor_min = product // k
            n_factors_less = products_n - bisect(products, factor_min)

            result += n_factors_less

            insort(products, product)
            products_n += 1

        return result

Thank you for looking.

user2357112 · Accepted Answer

Your bisect and insort calls pass completely different arguments.

Assuming nums is a list of positive integers, your insort calls will always insert the new product at the end of the products list, which takes O(1) rather than O(n) time for insertion, and which is highly amenable to branch prediction for the binary search. (Python object comparisons are more complex than a simple hardware comparison instruction, but branch prediction still applies to the logic within the comparison hooks, particularly since int comparisons are implemented in C.)

Meanwhile, assuming k is substantially greater than typical elements of nums, your bisect calls will find a spot for factor_min somewhere in the middle of the list, likely near but not at the end most of the time. This is much less amenable to branch prediction.

Without a more sophisticated test, I cannot be sure that branch prediction is the dominating factor, but it seems likely.

How can bisect_right() be 4x slower than insort_right()?

Answers (1)

Related Questions