newbie_coder
newbie_coder

Reputation: 75

What is the time complexity of this function that iterates through a list a creates a dictionary?

I have a function that rearranges an input list in a certain way and returns the output list. I am confused about what the time and space complexity of the function will be. Below is the code:

def rearrange_list(inp_list):
    d = {}
    output_list = []
    for num in inp_list:
        if num in d:
            d[num] += 1
        else:
            d[num] = 0
            output_list.append(num)
    for k,v in d.items():
        if v > 0:
            for i in range(v):
                output_list.append(k)
    return output_list

This is my complexity analysis:

The main confusion I have is should I consider iterating through the dictionary O(n) too since worst case we will have n items in the list, or should it be represent it by m like I did in my analysis since it can be anything from 0 to n?

Thank you in advance for your help!

Upvotes: 4

Views: 1893

Answers (3)

kcsquared
kcsquared

Reputation: 5409

Your time and space complexity are both Theta(n). While sometimes it can be useful for clarity to include terms in the time or space complexity that don't change the asymptotic value (a classic example being string searching algorithms), it doesn't make as much sense here.

While your claim of O(n + m^2) time complexity is technically correct as Big-O notation is an upper bound, you can show that O(n) is also an upper bound, since the dictionary has size at most n and we iterate over each key exactly once: there are n items read from input, at most n loop iterations for the dictionary, and n items appended to the output list.

If you wanted, you can calculate 'auxiliary' space required, which would be the amount of space needed but not counting the input or output arrays. Here, that would be Theta(m). You should note, however, that such analyses are fairly uncommon: by assumption, unless specified otherwise, space complexity analysis will include the size of the output.


To address a common confusion about why the second loop is still linear time with many duplicate values, let's look at an example.

The lines in question are:

for k, v in d.items():
    if v > 0:
        for i in range(v):
            output_list.append(k)

Suppose our input list was [1, 1, 1, 1, 1, 1, 1, 2, 2, 2] (ten elements total: seven '1's and three '2's).

Then our dictionary.items() (which has the count of each element, minus one) would look like: [(key: 1, value: 6), (key: 2, value: 2)] (it's not really stored as a Python list of tuples internally, but these are the full contents of the items view-object).

Let's walk through that second loop's operation, line by line:

for k, v in [(key: 1, value: 6), (key: 2, value: 2)]: 
# On our first iteration, so k = 1, v = 6.

    if 6 > 0:                      # 1 operation
        for i in range(6):         # 6 operations
            output_list.append(1)  # 6 operations

# For the (k, v) pair of (1, 6), the inner-loop has run 6 times.
# Every time the inner-loop runs, the length of output_list
# increases by 1

# Second iteration of outer-loop runs again:
for k, v in [(key: 1, value: 6), (key: 2, value: 2)]: 
# On our second iteration, so k = 2, v = 2.

    if 2 > 0:                      # 1 operation
        for i in range(2):         # 2 operations
            output_list.append(2)  # 2 operations

# For the (k, v) pair of (1, 6), the inner loop has run 2 times.
# In total: 8 inner-loop iterations, and output_list has len 8

In informal complexity analysis, a 'standard' rule of thumb is that the run-time of a double-nested loop is often quadratic. This is because we're counting the total number of inner-loop iterations, for example

for i in range(n):
    for j in range(n):

as

(n inner-loops per outer-loop) * (n outer-loops) = n^2 inner-loops

This assumption shouldn't be applied when the number of inner-loop iterations varies dramatically based on the state of the outer-loop. In our example, the inner-loop iterations is v, which depends on the outer loop.

To find the runtime here, we need a different way to count how many inner-loop iterations occur. Luckily, we can do that: in each inner-loop iteration, we append an element to output_list.

Since the final length of output_list is n, we know that the inner-loop has executed at most n times (technically, it's executed exactly n-m times, since the output_list already has size m after the earlier dictionary-initializing loop has terminated). Instead of incorrectly multiplying this by m, the number of outer-loop iterations, we should instead add the inner and outer loop iterations for a runtime of Theta(n+m) which is Theta(n).


Addendum: Comments have correctly pointed out that since Python dictionaries don't have an O(1) amortized worst-case lookup/insert time guarantee, so the first loop is, at best, Omega(m*n). While Python uses pseudo-random probing on an open-addressing table, this only ensures good 'average' performance. Thanks to Kelly Bundy for the highly informative discussion and corrections.

Unfortunately, while O(1) amortized worst-case lookup/insert hashing is possible, for example with Cuckoo hashing, in practice this is significantly slower on average than what's currently used in most standard libraries, and is unlikely to change in the near future.

Upvotes: 3

Tomer Geva
Tomer Geva

Reputation: 1834

Time complexity

You can divide the code into two parts.

Part 1

for num in inp_list:
    if num in d:
        d[num] += 1
    else:
        d[num] = 0
        output_list.append(num)

In the first line you iterate through inp_list, and in each iteration you call if num in d. What python does is searches through the keys of dictionary d, therefore the time complexity for this part is O(nm) where n is the size of inp_list and m is the size the number of unique values in inp_list.

Part 2

for k,v in d.items():
    if v > 0:
        for i in range(v):
            output_list.append(k)

In this part you iterate through the size of the dictionary in the first row which is O(m). I am ignoring the nested for loop since the loop can be replaced by the following:

output_list = output_list + [k] * v

Which happens in O(1)

In conclusion, the time complexity should be O(nm + m) = O (m(n+1)) = O(nm).

Nevertheless, since d is a dictionary, the key search takes O(1) instead of O(m) (see more here), making the time complexity drop to O(n).

Space complexity

Since a dictionary with m keys is created (where m is the number of unique values in inp_list, the space complexity is O(n+m) < O(2n) = O(n)

Upvotes: 1

Abhinav Mathur
Abhinav Mathur

Reputation: 8101

Do a step by step breakdown of the algorithm:

  1. First for loop (for num in inp_list).
    It simply iterates over the list, which takes O(N) time, and the dictionary has size O(number of unique elements in list), which can be generalized as O(N) space (where N = length of list = max possible number of unique keys).
  2. Second for loop (for k,v in d.items()).
    This iterates over the keys in dict, which are O(number of unique elements in list) in count.
  3. Third for loop (for i in range(v)).
    Since summation of v would just be the count of duplicate elements in the list, it will have an upper bound of O(N).

A better approximation of the algorithm should be O(N) time and space, rather than the proposed O(n + m^2)

Upvotes: 2

Related Questions