Reputation: 493
If I have a string like abcabcabc
and here, clearly, abc
is a pattern. I want to find out patterns using c/c++.
I don't want the implementation. A pseudo-code/algorithm will do just fine.
How can I do it?
Upvotes: 1
Views: 739
Reputation: 353
One way to find out a pattern withn itself is using Knuth-Morris-Pratt's algorithm's precomputation algorithm with a time complexity of O(P.length) where P is a given string, which computes a lookup table 'PI' that holds the lengths of the longest suffix that matches its corresponding prefix ("a", "ab", "abc", ...).
Pseudocode taken from Introduction to Algorithms, CLRS. Also, Linux has a decent implementation of the above algorithm.
Therefore, P.length - PI[P.length] = k = length of the smallest repeating pattern. Remember, k will always remain in the range [0, P.length).
For eg, "abcabcabc" = PI[0, 0, 0, 1, 2, 3, 4, 5, 6]. Here, the length of the smallest repeating pattern would be 9 - 6 = 3. But does k divide the string equally?
Hence, if P.length mod k == 0 ? P[ 1..k ] would be your repeating pattern.
Upvotes: 0
Reputation: 15875
Use floyd cycle finding algorithm. This uses slow-fast analogy to find cycle. Python source code given from Wikipedia:
def floyd(f, x0):
# Main phase of algorithm: finding a repetition x_i = x_2i
# The hare moves twice as quickly as the tortoise and
# the distance between them increases by 1 at each step.
# Eventually they will both be inside the cycle and then,
# at some point, the distance between them will be
# divisible by the period λ.
tortoise = f(x0) # f(x0) is the element/node next to x0.
hare = f(f(x0))
while tortoise != hare:
tortoise = f(tortoise)
hare = f(f(hare))
# At this point the tortoise position, ν, which is also equal
# to the distance between hare and tortoise, is divisible by
# the period λ. So hare moving in circle one step at a time,
# and tortoise (reset to x0) moving towards the circle, will
# intersect at the beginning of the circle. Because the
# distance between them is constant at 2ν, a multiple of λ,
# they will agree as soon as the tortoise reaches index μ.
# Find the position μ of first repetition.
mu = 0
tortoise = x0
while tortoise != hare:
tortoise = f(tortoise)
hare = f(hare) # Hare and tortoise move at same speed
mu += 1
# Find the length of the shortest cycle starting from x_μ
# The hare moves one step at a time while tortoise is still.
# lam is incremented until λ is found.
lam = 1
hare = f(tortoise)
while tortoise != hare:
hare = f(hare)
lam += 1
return lam, mu
Time complexity of this solution is O(λ, μ)
and auxiliary space is O(1)
.
Upvotes: 1
Reputation: 402
Try looking up this: http://en.wikipedia.org/wiki/Cycle_detection Don't think of it as a string, but rather finding a period. Whether or not it is a string does not matter.
Upvotes: 0