Futurballa
Futurballa

Reputation: 11

Struggle to understand time-complexity. Where do my reasoning fail?

I'm a university student majoring in Data Science, and we're currently discussing time-complexity. I could already code in Python before my studies, but I've never payed much attention to time efficiency, and I have never studied Big-O notation before. It seems like, in many exercises, I'm always close to the solution but my answer is never right.

Take for example this Python code:

def fun(N, M):
    S1 = set(N)
    S2 = set(M)
    res = []
    for x in S1:
        if x in S2:
            for i in range(N.count(x)):
                res.append(x)
    return res

Say length of N is n and length of M is m. My reasoning is the following:

If I'm correct, complexity should be O(n3m). Yet, the correct solution is O(n2+ m)

Can you help me understanding where my reasoning fails?
I'm sorry if my question may seem obvious or naive, I'm not that experienced in thinking in complexity terms.

Upvotes: 0

Views: 111

Answers (1)

Julien
Julien

Reputation: 15206

You need to check the complexity for all lines:

def fun(N, M):
    S1 = set(N) # O(n)
    S2 = set(M) # O(m)
    res = [] # O(1)
    for x in S1: # O(n) * ... SEE CORRECTION
        if x in S2: # O(1)
            for i in range(N.count(x)): # O(n) * ... SEE CORRECTION
                res.append(x) # O(1)
    return res # O(1)

Then put it together:

O(n + m + 1 + n*(1+n*1) + 1) = O(n**2 + m)

Correction / subtleties:

N.count(x) is always O(n), but it's O(N)+... not O(n)*.... The * complexity depends on the value of N.count(x) not its complexity. Think of it as:

k = N.count(x) # O(n)
for i in range(k): # O(k)*...

instead of:

for i in range(N.count(x)): # O(???)

Then what I wrote is the worst case (although with a very misleading shortcut) when all elements in N are different. In that case for x in S1: is really O(n)*... but then k = N.count(x) = 1 for all x so for i in range(N.count(x)): is O(n) + O(1)*... (not O(n)*...) still leading to O(n + m + 1 + n*(1+n+1*1) + 1) = O(n**2 + m).

In the best case however when all elements in N are identical, then there is only one value for x so for x in S1: is O(1)*..., but also k = N.count(x) = n for that unique x. so for i in range(N.count(x)): is O(n) + O(n)*... now leading to O(n + m + 1 + 1*(1+n+n*1) + 1) = O(n + m).

(But you typically shouldn't focus on best case complexity.)

Upvotes: 4

Related Questions