Elliot
Elliot

Reputation: 493

How can I check for a pattern in a string in C/C++?

If I have a string like abcabcabc and here, clearly, abc is a pattern. I want to find out patterns using c/c++.

I don't want the implementation. A pseudo-code/algorithm will do just fine.

How can I do it?

Upvotes: 1

Views: 739

Answers (3)

Achilles Rasquinha
Achilles Rasquinha

Reputation: 353

One way to find out a pattern withn itself is using Knuth-Morris-Pratt's algorithm's precomputation algorithm with a time complexity of O(P.length) where P is a given string, which computes a lookup table 'PI' that holds the lengths of the longest suffix that matches its corresponding prefix ("a", "ab", "abc", ...).

enter image description here

Pseudocode taken from Introduction to Algorithms, CLRS. Also, Linux has a decent implementation of the above algorithm.

Therefore, P.length - PI[P.length] = k = length of the smallest repeating pattern. Remember, k will always remain in the range [0, P.length).

For eg, "abcabcabc" = PI[0, 0, 0, 1, 2, 3, 4, 5, 6]. Here, the length of the smallest repeating pattern would be 9 - 6 = 3. But does k divide the string equally?

Hence, if P.length mod k == 0 ? P[ 1..k ] would be your repeating pattern.

Upvotes: 0

Kaidul
Kaidul

Reputation: 15875

Use floyd cycle finding algorithm. This uses slow-fast analogy to find cycle. Python source code given from Wikipedia:

def floyd(f, x0):
    # Main phase of algorithm: finding a repetition x_i = x_2i
    # The hare moves twice as quickly as the tortoise and
    # the distance between them increases by 1 at each step.
    # Eventually they will both be inside the cycle and then,
    # at some point, the distance between them will be
    # divisible by the period λ.
    tortoise = f(x0) # f(x0) is the element/node next to x0.
    hare = f(f(x0))
    while tortoise != hare:
        tortoise = f(tortoise)
        hare = f(f(hare))

    # At this point the tortoise position, ν, which is also equal
    # to the distance between hare and tortoise, is divisible by
    # the period λ. So hare moving in circle one step at a time, 
    # and tortoise (reset to x0) moving towards the circle, will 
    # intersect at the beginning of the circle. Because the 
    # distance between them is constant at 2ν, a multiple of λ,
    # they will agree as soon as the tortoise reaches index μ.

    # Find the position μ of first repetition.    
    mu = 0
    tortoise = x0
    while tortoise != hare:
        tortoise = f(tortoise)
        hare = f(hare)   # Hare and tortoise move at same speed
        mu += 1

    # Find the length of the shortest cycle starting from x_μ
    # The hare moves one step at a time while tortoise is still.
    # lam is incremented until λ is found.
    lam = 1
    hare = f(tortoise)
    while tortoise != hare:
        hare = f(hare)
        lam += 1

    return lam, mu

Time complexity of this solution is O(λ, μ) and auxiliary space is O(1).

Upvotes: 1

GettnDer
GettnDer

Reputation: 402

Try looking up this: http://en.wikipedia.org/wiki/Cycle_detection Don't think of it as a string, but rather finding a period. Whether or not it is a string does not matter.

Upvotes: 0

Related Questions