shidoro
shidoro

Reputation: 370

Time complexity for two different solutions

I want to understand the difference in time complexity between these two solutions. The task is not relevant but if you're curious here's the link with the explanation.

This is my first solution. Scores a 100% in correctness but 0% in performance:

def solution(s, p ,q):
    dna = dict({'A': 1, 'C': 2, 'G': 3, 'T': 4})
    result = []
    
    for i in range(len(q)):
        least = 4
        
        for c in set(s[p[i] : q[i] + 1]):
            if least > dna[c]: least = dna[c]
        
        result.append(least)
            
    return result

This is the second solution. Scores 100% in both correctness and performance:

def solution(s, p ,q):
    result = []
    
    for i in range(len(q)):
        if 'A' in s[p[i]:q[i] + 1]: result.append(1)
        elif 'C' in s[p[i]:q[i] + 1]: result.append(2)
        elif 'G' in s[p[i]:q[i] + 1]: result.append(3)
        else: result.append(4)
        
    return list(result)

Now this is how I see it. In both solutions I'm iterating through a range of Q length and on each iteration I'm slicing different portions of a string, with a length between 1 and 100,000.

Here's where I get confused, in my first solution on each iteration, I'm slicing once a portion of the string and create a set to remove all the duplicates. The set can have a length between 1 and 4, so iterating through it must be very quick. What I notice is that I iterate through it only once, on each iteration.

In the second solution on each iteration, I'm slicing three times a portion of the string and iterate through it, in the worst case three times with a length of 100,000.

Then why is the second solution faster? How can the first have a time complexity of O(n*m) and the second O(n+m)?

I thought it's because of the in and the for operators, but I tried the same second solution in JavaScript with the indexOf method and it still gets a 100% in performance. But why? I can understand that if in Python the in and the for operators have different implementations and work differently behind the scene, but in JS the indexOf method is just going to apply a for loop. Then isn't it the same as just doing the for loop directly inside my function? Shouldn't that be a O(n*m) time complexity?

Upvotes: 1

Views: 184

Answers (2)

Lorenzo Felletti
Lorenzo Felletti

Reputation: 563

You haven't specified how the performance rating is obtained, but anyway, the second algorithm is clearly better, mainly because it uses the in operator, that under the hood calls a function implemented in C, which is far more efficient than python. More on this topic here. Also, I'm not sure, but I don't think that the python interpreter isn't smart enough to slice the string only once and then reuse the same portion the other times in the second algorithm. Creating the set in the first algorithm also seems like a very costly operation.

Lastly, maybe the performance ratings aren't based on the algorithm complexity, but rather on the execution time over different test strings?

Upvotes: 1

Stef
Stef

Reputation: 15525

I think the difference in complexity can easily be showcased on an example.

Consider the following input:

s = 'ACGT' * 1000000
# = 'ACGTACGTACGTACGTACGTACGTACGTACGTACGT...ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT'
p = [0]
q = [3999999]

Algorithm 2 very quickly checks that 'A' is in s[0:4000000] (it's the first character - no need to iterate through the whole string to find it!).

Algorithm 1, on the other hand, must iterate through the whole string s[0:4000000] to build the set {'A','C','G','T'}, because iterating through the whole string is the only way to check that there isn't a fifth distinct character hidden somewhere in the string.

Important note

I said algorithm 2 should be fast on this example, because the test 'A' in ... doesn't need to iterate through the whole string to find 'A' if 'A' is at the beginning of the string. However, note a possible important difference in complexity between 'A' in s and 'A' in s[0:4000000]. The problem is that creating a slice of the string might cost time (and memory) if it's copying the string. Instead of slicing, you should use s.find('A', 0, 4000000), which is guaranteed not to build a copy. For more information on this:

Upvotes: 1

Related Questions