Reputation: 6179
I have implemented the below Naive String search algorithm ('find')
. It's as simple as it can get. However I found another way on 'GeeksforGeeks', ('search')
it looked to have a better complexity. When I tested it for large strings, the results are drastically different but opposite.
1st: Slice the string into pattern length and compare. Move forward one character slice and compare. What should be the complexity of this?
def find(pat, txt):
size = len(pat)
for i in range( len(txt) -size + 1 ):
if txt[i : i + size] == pat:
print 'Pattern found at index %s'%(i)
2nd: Compare character by character. If a character doesn't match break. Else continue. In the end if all the characters matched print the result. Move forward one character. What should be the complexity of this?
def search(pat, txt):
M = len(pat)
N = len(txt)
for i in xrange(N-M+1):
status = 1
for j in xrange(M):
if txt[i+j] != pat[j]:
status = 0
break
if j == M-1 and status != 0:
print "Pattern found at index " + str(i)
Timing test Cases:
testString = ''.join([ 'a' for _ in range(1000*100)] ) + 'b'
testPattern = ''.join([ 'a' for _ in range(100*100) ]) + 'b'
import cProfile
cProfile.run('find(testPattern, testString)')
cProfile.run('search(testPattern, testString)')
for find
Pattern found at index 90000
90007 function calls in 0.160 seconds
For search
Pattern found at index 90000
5 function calls in 135.951 seconds
In my algo find
I do slicing and comparison. The time complexity for slicing is O(k), similarly for comparison it should take another O(k) though not sure.
Python Time Complexity
While in search
we run the loop just 'k' times. So shouldn't it have a better time complexity.
Upvotes: 1
Views: 5105
Reputation: 1
I have implemented naive search code in PYTHON in simple understanding as below. It will return no of time pattern got found.
def naive_pattern_search(data,search):
n = len(data) #Finding length of data
m = len(search) #Finding length of pattern to be searched.
i = 0
count = c = 0 #Taking for counting pattern if exixts.
for j in range(m-1):#Loop continue till length of pattern to be Search.
while i <= (n-1):#Data loop
#if searched patten length reached highest index at that time again initilize with 0.
if j > (m-1):
j = 0
#Data and search have same element then both Index increment by 1.
if data[i]==search[j]:
#print(f"\n{ data[i] } { search[j] }")
#print(f"i : {i} {data[i]} j : {j} {search[j]}")
i+=1
j+=1
count+=1
#If one pattern compared and found Successfully then Its Counter for pattern.
if count== (m-1):
c = c + 1
#Initilise pattern again with 0 for searching with next element in data.
else:
j = 0 #Direct move to 0th index.
i+=1
count=0 #If data not found as per pattern continuously then it will start counting from 0 again.
#Searched pattern occurs more then 0 then its Simply means that pattern found.
if c > 0:
return c;
else:
return -1;
Upvotes: 0
Reputation: 51998
Your two algorithms are essentially the same (as @Zah pointed out) with the only difference that the inner loop in the second algorithm is done by the underlying C code in the first algorithm. What you are observing is the difference between compiled and interpreted code.
If you want all indices and want to exploit the built-in methods:
def findAll(s,t):
"""returns all indices where substring t occurs in string s"""
indices = []
i = s.find(t)
while i > -1:
indices.append(i)
i = s.find(t,i+1)
return indices
For example,
>>> findAll("The cat in the hat","at")
[5, 16]
Upvotes: 5