Reputation: 481
I've written a small, quick (to write, not to run!), algorithm for string matching in Python:
def bruteMatch(n,m):
for i in range(0,len(n)-len(m)+1):
if n[i:len(m)+i]==m:
return(i)
Would the runtime for this algorithm be O(nm)? I'm comparing it to the worst-case runtime for the Horspool string-matching algorithm, which is also (nm). I guess my confusion stems from the fact that my algorithm appears initially as O(n) runtime as it simply iterates through the input n, using the indices as inputs to a slice notation equality statement? Thoughts?
Upvotes: 1
Views: 2348
Reputation: 5960
def bruteMatch(n,m):
for i in range(0,len(n)-len(m)+1):
if n[i:len(m)+i]==m:
return(i)
for i in range(0,len(n)-len(m)+1)
will loop len(n)-len(m)+1
times
if n[i:len(m)+i]==m
compare all characters in n from i to len(m)+i
with m
Example 1
n = "Hello"
m = "Jello"
len(n)-len(m)+1 = 4-4+1 = 1
range(0,1)
i=0
if n[0:4+0] == m ~ n[0:4] == m[:] ~ "Hello" == "Jello"
FALSE
Example 2
n = "Hi"
m = "Jello"
len(n)-len(m)+1 = 2-4+1 = -1
range(0,-1) ~ *LOGICAL ERROR*
Example 3
n = "Hello"
m = "Hi"
len(n)-len(m)+1 = 4-2+1 = 3
range(0,3)
i=0
if n[0:2+0] == m ~ n[0:2] == m[:] ~ "He" == "Hi"
FALSE
i=1
if n[1:2+1] == m ~ n[0:3] == m[:] ~ "Hel" == "Hi"
FALSE
i=2
if n[2:2+2] == m ~ n[0:4] == m[:] ~ "Hell" == "Hi"
FALSE
Conclusion
Your code is going to compare n[0:z]
for z
in range (0, len(n)+1)
for len(n)-len(m)+1
times:
If len(n) == len(m) then check n == m
Else if len(n) > len(n) then return check m is in [ n[0:len(m)], .., n[0:len(n)] ]
Else if len(n) < len(m) then Error: Invalid range
Upvotes: 0
Reputation: 375
This does indeed take O(n*m) time. You run the loop (n-m) times, and the string comparison itself takes (min(n,m)) time. This is fine while either n or m is very small, but consider a worst case where m=n/2:
The loop is executed (n-n/2) times, and the comparison takes (n/2) time, a total of (O(n^2)) time - not so great!
If performance is important, and the search string is large, consider using a hash-based algorithm such as Rabin–Karp.
Hope this helps!
Upvotes: 0
Reputation: 958
Worst case is O(nm), yes. Because, you see, the ==
test tests each character in each string, which could take as long as the length of the shortest string in the equality test.
Upvotes: 1