Reputation: 370
I want to understand the difference in time complexity between these two solutions. The task is not relevant but if you're curious here's the link with the explanation.
This is my first solution. Scores a 100% in correctness but 0% in performance:
def solution(s, p ,q):
dna = dict({'A': 1, 'C': 2, 'G': 3, 'T': 4})
result = []
for i in range(len(q)):
least = 4
for c in set(s[p[i] : q[i] + 1]):
if least > dna[c]: least = dna[c]
result.append(least)
return result
This is the second solution. Scores 100% in both correctness and performance:
def solution(s, p ,q):
result = []
for i in range(len(q)):
if 'A' in s[p[i]:q[i] + 1]: result.append(1)
elif 'C' in s[p[i]:q[i] + 1]: result.append(2)
elif 'G' in s[p[i]:q[i] + 1]: result.append(3)
else: result.append(4)
return list(result)
Now this is how I see it. In both solutions I'm iterating through a range of Q length and on each iteration I'm slicing different portions of a string, with a length between 1 and 100,000.
Here's where I get confused, in my first solution on each iteration, I'm slicing once a portion of the string and create a set to remove all the duplicates. The set can have a length between 1 and 4, so iterating through it must be very quick. What I notice is that I iterate through it only once, on each iteration.
In the second solution on each iteration, I'm slicing three times a portion of the string and iterate through it, in the worst case three times with a length of 100,000.
Then why is the second solution faster? How can the first have a time complexity of O(n*m) and the second O(n+m)?
I thought it's because of the in
and the for
operators, but I tried the same second solution in JavaScript with the indexOf
method and it still gets a 100% in performance. But why? I can understand that if in Python the in
and the for
operators have different implementations and work differently behind the scene, but in JS the indexOf
method is just going to apply a for loop. Then isn't it the same as just doing the for loop directly inside my function? Shouldn't that be a O(n*m) time complexity?
Upvotes: 1
Views: 184
Reputation: 563
You haven't specified how the performance rating is obtained, but anyway, the second algorithm is clearly better, mainly because it uses the in
operator, that under the hood calls a function implemented in C, which is far more efficient than python. More on this topic here.
Also, I'm not sure, but I don't think that the python interpreter isn't smart enough to slice the string only once and then reuse the same portion the other times in the second algorithm.
Creating the set in the first algorithm also seems like a very costly operation.
Lastly, maybe the performance ratings aren't based on the algorithm complexity, but rather on the execution time over different test strings?
Upvotes: 1
Reputation: 15525
I think the difference in complexity can easily be showcased on an example.
Consider the following input:
s = 'ACGT' * 1000000
# = 'ACGTACGTACGTACGTACGTACGTACGTACGTACGT...ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT'
p = [0]
q = [3999999]
Algorithm 2 very quickly checks that 'A'
is in s[0:4000000]
(it's the first character - no need to iterate through the whole string to find it!).
Algorithm 1, on the other hand, must iterate through the whole string s[0:4000000]
to build the set {'A','C','G','T'}
, because iterating through the whole string is the only way to check that there isn't a fifth distinct character hidden somewhere in the string.
I said algorithm 2 should be fast on this example, because the test 'A' in ...
doesn't need to iterate through the whole string to find 'A'
if 'A'
is at the beginning of the string. However, note a possible important difference in complexity between 'A' in s
and 'A' in s[0:4000000]
. The problem is that creating a slice of the string might cost time (and memory) if it's copying the string. Instead of slicing, you should use s.find('A', 0, 4000000)
, which is guaranteed not to build a copy. For more information on this:
Upvotes: 1