Reputation: 49
Data from file read, are stored in a dictionary with file titles as keys, and as value, there is another dictionary, which has words as keys and their occurrence as value. An example of the structure is below:
data = {
'file1.txt': {
'word1': 45,
'word2': 23,
...
}
'file2.txt': {
'word3': 25,
'word4': 10
...
}
...
}
I want to calculate the code complexity of the python code above.
words = Counter()
for file in data:
words.update(data[file])
return words.most_common(5)
Some thoughts: Dictionary iterating has a big O notation O(N), when N is the number of files. But, also the new Counter() is updated for every word inside the file, which means that the code complexity has a linear dependency on the number of words inside the file. But also, it can make a difference if a file has words repeated or not. The code consumes more time if, for example, a file has 5 different words, than having the same word 5 times.
Also, most_common()
method has a O(KlogN) complexity, with K the number of files, and N the number of words.
Is my assumption correct? Or am I missing something?
So, what is the total complexity of the code above?
Upvotes: 0
Views: 171
Reputation: 5860
Big Oh notation represents the worst case complexity of the code. If I represent files as F1, F2, F3....Fn and words inside each file as W1 as number of words in 1st file and so on, the time would be like -
F1*W1 + F2*W2....Fn*Wn
So if the number of files are n and the maximum number of words in any file is k the worst case (when all the files have maximum number of words) complexity will be -
O(n * k) where k is max(W1, W2....Wn)
And in case of most_common
If the word counts are in sorted order(using a sorted dict) which I doubt if it is a normal dictionary then you'll get a complexity of O(k). If it isn't sorted then the complexity would be same as above as it'll have to iterate over the list of each words in each file.
Upvotes: 1