esac
esac

Reputation: 24685

How to find a repeating pattern in a list of strings?

I have a list of methods from a trace that is taken of a process. I want to detect if anything is essentially called in a loop. For example, I might have

method1
    method2
    method2
    method2

Might display 'method2 called by method1 3 times in a possible loop'.

method3
    method1
    methodd
    methoda
    methodb
    methodc
    methodd
    methoda
    methodb
    methodc
    methodd

Would display 'methoda, methodb, methodc, methodd 2 times in a possible loop'.

Obviously, there is no guarantee that there is a loop, but at least we know that there is a repeating pattern. The only input is the list of the children (for example from above method2, method2, method2).

Upvotes: 3

Views: 1648

Answers (2)

jameszhao00
jameszhao00

Reputation: 7301

Build a suffix array along with longest common prefixes and find K consecutive LCPs with values > threshold (where K is another threshold)

In the search, require that the found suffixes' candidate LCPs are consecutive in the source string/do not overlap. Search algorithm:

let minimum_iterations = 1
let minimum_pattern_length = 2
let src = source string
let suffix = sorted suffix array, 
             where each entry is suffix i0 (inclusive) to |src| (exclusive)
let lcp = longest common prefixes, 
             where lcp(i) = longest common prefix of suffix(i) and suffix(i-1) when i > 0,
             or 0 if i = 0

for i = 0 to |suffix|:
    if lcp[i] < minimum_pattern_length: continue        
    let j = i
    let iterations = 0
    let last_matched_i0 = suffix[i-1].i0
    while j < |suffix| and lcp[j] >= lcp[i]:
        if (suffix[j].i0 + lcp[i] + 1) = last_matched_i0: 
            iterations++
            last_matched_i0 = suffix[j].i0
        j++

    print 'pattern = ' .. suffix[i][0..lcp[i]] .. ' iterations = ' .. iterations
    print 'starting at ' .. suffix[j-1].i0 .. ' to ' .. suffix[i-1].i0 + lcp[i]

    skip pattern if wanted

(suffix[j].i0 + lcp[i] + 1) = last_matched_i0 Make sure that the matched pattern is consecutive. We assume suffix[j] includes suffix[j-1] if they're both patterns, because in a sorted suffix array ab will always come before abab

Example: 3ababab65

Suffixes

  • 3ababab65
  • ababab65
  • babab65
  • abab65
  • bab65
  • ab65
  • b65
  • 65
  • 5

Sorted Suffixes along with Longest Common Prefix (LCP - Suffix): LCP(i) is the longest common prefix between SUFFIX(i) and SUFFIX(i-1)

  • 0 - 5
  • 0 - 65
  • 0 - 3ababab65
  • 0 - ab65
  • 2 - abab65
  • 4 - ababab65
  • 0 - b65
  • 1 - bab65
  • 3 - babab65

LCP Candidates:

  • ab65, abab65, ababab65 - consecutive, no overlap
  • abab65, ababab65 - overlap
  • b65, bab65 - not consecutive
  • bab65, babab65 - overlap

Possible patterns:

  • ab

You have to modify the code a bit to handle the following case:

ababa

You will have suffixes of aba, and ababa, with LCP of 3


Or you could just do this... Shortest Repeating Sub-String ... Donno how efficient it is... but way easier.

Upvotes: 1

bkmtan
bkmtan

Reputation: 101

You could insert the list of children into a linked list as per the order, then run a linked list loop detection algorithm.

e.g. have 2 pointers, 1 pointer that moves a node at a time and another pointer that moves 2 nodes at a time, if a loop exists, the faster pointer will eventually be at the same node as the slower one

Upvotes: 1

Related Questions