Reputation: 11
I'm a university student majoring in Data Science, and we're currently discussing time-complexity. I could already code in Python before my studies, but I've never payed much attention to time efficiency, and I have never studied Big-O notation before. It seems like, in many exercises, I'm always close to the solution but my answer is never right.
Take for example this Python code:
def fun(N, M):
S1 = set(N)
S2 = set(M)
res = []
for x in S1:
if x in S2:
for i in range(N.count(x)):
res.append(x)
return res
Say length of N is n and length of M is m. My reasoning is the following:
for x in S1
is at worst O(n) because it loops at worst n times if x in S2
is at worst O(m) because at worst it makes m comparisons N.count(x)
is O(n) because - worst case - it will perform n operations (while counting the occurrences of x in N) .append()
operation is performed n times (as I said above)If I'm correct, complexity should be O(n3m). Yet, the correct solution is O(n2+ m)
Can you help me understanding where my reasoning fails?
I'm sorry if my question may seem obvious or naive, I'm not that experienced in thinking in complexity terms.
Upvotes: 0
Views: 111
Reputation: 15206
You need to check the complexity for all lines:
def fun(N, M):
S1 = set(N) # O(n)
S2 = set(M) # O(m)
res = [] # O(1)
for x in S1: # O(n) * ... SEE CORRECTION
if x in S2: # O(1)
for i in range(N.count(x)): # O(n) * ... SEE CORRECTION
res.append(x) # O(1)
return res # O(1)
Then put it together:
O(n + m + 1 + n*(1+n*1) + 1) = O(n**2 + m)
Correction / subtleties:
N.count(x)
is always O(n)
, but it's O(N)+...
not O(n)*...
. The *
complexity depends on the value of N.count(x)
not its complexity.
Think of it as:
k = N.count(x) # O(n)
for i in range(k): # O(k)*...
instead of:
for i in range(N.count(x)): # O(???)
Then what I wrote is the worst case (although with a very misleading shortcut) when all elements in N
are different. In that case for x in S1:
is really O(n)*...
but then k = N.count(x) = 1
for all x
so for i in range(N.count(x)):
is O(n) + O(1)*...
(not O(n)*...
) still leading to O(n + m + 1 + n*(1+n+1*1) + 1) = O(n**2 + m)
.
In the best case however when all elements in N
are identical, then there is only one value for x
so for x in S1:
is O(1)*...
, but also k = N.count(x) = n
for that unique x
. so for i in range(N.count(x)):
is O(n) + O(n)*...
now leading to O(n + m + 1 + 1*(1+n+n*1) + 1) = O(n + m)
.
(But you typically shouldn't focus on best case complexity.)
Upvotes: 4