Reputation: 508
I wrote the following simple function that checks whether str1
is a permutation of str2
:
def is_perm(str1, str2):
return True if sorted(str1)==sorted(str2) else False
Assuming that sorted(str)
has a time complexity of O(n*logn)
, we can expect a time complexity of O(2*n*logn)=O(n*logn)
. The following function is an attempt to achieve a better time complexity:
def is_perm2(str1, str2):
dict1 = {}
dict2 = {}
for char in str1:
if char in dict1:
dict1[char] += 1
else:
dict1[char] = 1
for char in str2:
if char in dict2:
dict2[char] += 1
else:
dict2[char] = 1
if dict1==dict2:
return True
else:
return False
Each for-loop iterates n times. Assuming that dictionary lookup and both dictionary updates have constant time complexity, I expect an overall complexity of O(2n)=O(n)
. However, timeit
measurements show the following, contradicting results. Why is is_perm2
slower than is_perm
after 1000000 executions even though it's time complexity looks better? Are my assumptions wrong?
import timeit
print(timeit.timeit('is_perm("helloworld","worldhello")', 'from __main__ import is_perm', number=10000000))
print(timeit.timeit('is_perm2("helloworld","worldhello")', 'from __main__ import is_perm2', number=10000000))
# output of first print-call: 12.4199592999993934 seconds
# output of second print-call: 37.13826630001131 seconds
Upvotes: 3
Views: 308
Reputation: 349964
There is no guarantee that an algorithm with a time complexity of O(nlogn) will be slower than one with a time complexity of O(n) for a given input. The second one could for instance have a large constant overhead, making it slower for input sizes that are below 100000 (for instance).
In your test the input size is 10 ("helloworld"), which doesn't tell us much. Repeating that test doesn't make a difference, even if repeated 10000000 times. The repetition only gives a more precise estimate of the average time spent on that particular input.
You would need to feed the algorithm with increasingly large inputs. If memory allows, that would eventually bring us to an input size for which the O(nlogn) algorithm takes more time than the O(n) algorithm.
In this case, I found that the input size had to be really large in comparison with available memory, and I only barely managed to find a case where the difference showed:
import random
import string
import timeit
def shuffled_string(str):
lst = list(str)
random.shuffle(lst)
return "".join(lst)
def random_string(size):
return "".join(random.choices(string.printable, k=size))
str1 = random_string(10000000)
str2 = shuffled_string(str1)
print("start")
print(timeit.timeit(lambda: is_perm(str1, str2), number=5))
print(timeit.timeit(lambda: is_perm2(str1, str2), number=5))
After the initial set up of the strings (which each have a size of 10 million characters), I got this output:
54.72847577700304
51.07616817899543
The reason why the input has to be so large to see this happen, is that sorted
is doing all the hard work in lower-level, compiled code (often C), while the second solution does all the looping and character reading in Python code (often interpreted). It is clear that the overhead of the second solution is huge in comparison with the first.
Although not your question, we could improve the implementation of the second algorithm, by relying on another built-in function: Counter
:
def is_perm3(str1, str2):
return Counter(str1) == Counter(str2)
With the same test set up as above, I got this timing for this implementation:
24.917681352002546
Upvotes: 2
Reputation: 170
Assuming that dictionary lookup and both dictionary updates have constant time complexity,
python dictionary is hashmap.
So exactly, dictionary lookup and update costs O(n)
in worst case.
Total average time complexity of is_perm2
is O(n)
but worse case time complexity is O(n^2)
.
If you want get exactly O(n)
time complexity, please use List
(not Dictionary
) to store frequency of characters.
You can easily convert each character to ascii numbers and store their frequency to python list.
Upvotes: 0