Naive String Search Algorithm - Python

Question

I have implemented the below Naive String search algorithm ('find'). It's as simple as it can get. However I found another way on 'GeeksforGeeks', ('search') it looked to have a better complexity. When I tested it for large strings, the results are drastically different but opposite.

1st: Slice the string into pattern length and compare. Move forward one character slice and compare. What should be the complexity of this?

def find(pat, txt):
    size = len(pat)
    for i in range( len(txt) -size + 1 ):
        if txt[i : i + size] == pat:
            print 'Pattern found at index %s'%(i)

2nd: Compare character by character. If a character doesn't match break. Else continue. In the end if all the characters matched print the result. Move forward one character. What should be the complexity of this?

def search(pat, txt):
    M = len(pat)
    N = len(txt)

    for i in xrange(N-M+1):
        status = 1 
        for j in xrange(M):
            if txt[i+j] != pat[j]:
                status = 0
                break
        if j == M-1 and status != 0:
            print "Pattern found at index " + str(i)

Timing test Cases:

testString = ''.join([ 'a' for _ in range(1000*100)] ) + 'b'
testPattern = ''.join([ 'a' for _ in range(100*100) ])  + 'b'

import cProfile
cProfile.run('find(testPattern, testString)')
cProfile.run('search(testPattern, testString)')

for find

Pattern found at index 90000
         90007 function calls in 0.160 seconds

For search

Pattern found at index 90000
         5 function calls in 135.951 seconds

In my algo find I do slicing and comparison. The time complexity for slicing is O(k), similarly for comparison it should take another O(k) though not sure. Python Time Complexity

While in search we run the loop just 'k' times. So shouldn't it have a better time complexity.

John Coleman · Accepted Answer

Your two algorithms are essentially the same (as @Zah pointed out) with the only difference that the inner loop in the second algorithm is done by the underlying C code in the first algorithm. What you are observing is the difference between compiled and interpreted code.

If you want all indices and want to exploit the built-in methods:

def findAll(s,t):
    """returns all indices where substring t occurs in string s"""
    indices = []
    i = s.find(t)
    while i > -1:
        indices.append(i)
        i = s.find(t,i+1)
    return indices

For example,

>>> findAll("The cat in the hat","at")
[5, 16]

Naive String Search Algorithm - Python

Answers (2)

Related Questions