panos
panos

Reputation: 49

Code complexity when iterating inside python dictionaries

Data from file read, are stored in a dictionary with file titles as keys, and as value, there is another dictionary, which has words as keys and their occurrence as value. An example of the structure is below:

data = {
    'file1.txt': {
        'word1': 45,
        'word2': 23,
        ...
    }
    'file2.txt': {
        'word3': 25,
        'word4': 10
        ...
    }
    ...
}

I want to calculate the code complexity of the python code above.

words = Counter()
for file in data:
    words.update(data[file])
return words.most_common(5)

Some thoughts: Dictionary iterating has a big O notation O(N), when N is the number of files. But, also the new Counter() is updated for every word inside the file, which means that the code complexity has a linear dependency on the number of words inside the file. But also, it can make a difference if a file has words repeated or not. The code consumes more time if, for example, a file has 5 different words, than having the same word 5 times.

Also, most_common() method has a O(KlogN) complexity, with K the number of files, and N the number of words.

Is my assumption correct? Or am I missing something?

So, what is the total complexity of the code above?

Upvotes: 0

Views: 171

Answers (1)

hashcode55
hashcode55

Reputation: 5860

Big Oh notation represents the worst case complexity of the code. If I represent files as F1, F2, F3....Fn and words inside each file as W1 as number of words in 1st file and so on, the time would be like -

F1*W1 + F2*W2....Fn*Wn

So if the number of files are n and the maximum number of words in any file is k the worst case (when all the files have maximum number of words) complexity will be -

O(n * k) where k is max(W1, W2....Wn)

And in case of most_common

If the word counts are in sorted order(using a sorted dict) which I doubt if it is a normal dictionary then you'll get a complexity of O(k). If it isn't sorted then the complexity would be same as above as it'll have to iterate over the list of each words in each file.

Upvotes: 1

Related Questions