Reputation: 1109
I have 2 anagram detection functions; one uses sort and compare and the other keeps track of the number of occurences of each alphabetical character.
Assume here that the two strings passed to the functions are identical, the first randomly generated (unsorted) and the second = to the first, so that both functions execute "all the way" and return True.
According to my algorithmic analysis (obviously probably wrong...), the first function should be 2 * n log(n) + n, and the second function 2 + 2n + 2n + 26 = 28 + 5n.
def anag1(s1, s2):
s1 = list(s1)
s1.sort()
s2 = list(s2)
s2.sort()
for i in range(len(s1)):
if s1[i] != s2[i]:
return False
return True
def anag2(s1, s2):
l1 = [0] * 26
l2 = [0] * 26
for i in s1:
k = ord(i) % 26
l1[k] += 1
for i in s2:
k = ord(i) % 26
l2[k] += 1
return l1 == l2
For 2 strings of 100,000 characters, this should mean that the first function would be 3.4 million time units vs 400,000 time units for the second function.
In other words, the second function should be close to 8 times faster than the first.
However, when I time the execution of both function, the second one is not even twice as fast as the first. For 2 strings of 100,000 characters, the first function executes in 0.085s and the second in 0.055s.
What's going on?
Upvotes: 1
Views: 73
Reputation: 720
Theoretical complexity may have little with actual time spent. Consider:
And not every sort is O(n log (n))
O(n)
or close to.Edit: I implemented both in Java on 100M number long array. It was 383 vs. 161 ms. On 10M the times were almost equal.
O(n)
on arrays with many duplicate values (small character cardinality). Your language may use something similar.Upvotes: 3
Reputation: 8928
Not all primitive functions take the same amount of time. For example, your second method includes divisions, which often take more CPU cycles than other operations. While the first function is O(n log(n)) and the second is O(n), you don't know what the constant factor is.
What your analysis really indicates is that, if you now do it with million character strings, you should see that the difference in speeds widens.
Upvotes: 2