kMutagene
kMutagene

Reputation: 371

F sharp KMP Algorithm is stuck in the first while loop if i use a pattern with the same characters at the first two indces

I am playing around with the KMP algorithm in f sharp. While it works for patterns like "ATAT" (result will be [|0; 0; 1; 2;|]) , the first while loop enters a deadlock when the first 2 characters of a string are the same and the 3rd is another, for example "AAT".

I understand why: first, i gets incremented to 1. now the first condition for the while loop is true, while the second is also true, because "A" <> "T". Now it sets i to prefixtable.[!i], which is 1 again, and here we go.

Can you guys give me a hint on how to solve this?

let kMPrefix (pattern : string) = 

    let (m : int) = pattern.Length - 1
    let prefixTable = Array.create pattern.Length 0
    // i : longest proper prefix that is also a suffix 
    let i = ref 0

    // j: the index of the pattern for which the prefix value will be calculated
    // starts with 1 because the first prefix value is always 0

    for j in 1 .. m do

        while !i > 0 && pattern.[!i] <> pattern.[j] do

            i := prefixTable.[!i]

        if pattern.[!i] = pattern.[j] then

            i := !i+1           


        Array.set prefixTable j !i  

    prefixTable

Upvotes: 1

Views: 160

Answers (1)

Vandroiy
Vandroiy

Reputation: 6223

I'm not sure how to repair the code with a small modification, since it doesn't match the KMP algorithm's lookup table contents (at least the ones I've found on Wikipedia), which are:

  • -1 for index 0
  • Otherwise, the count of consecutive elements before the current position that match the beginning (excluding the beginning itself)

Therefore, I'd expect output for "ATAT" to be [|-1; 0; 0; 1|], not [|0; 0; 1; 2;|].

This type of problem might be better to reason about in functional style. To create the KMP table, you could use a recursive function that fills the table one by one, keeping track of how many recent characters match the beginning, and start running it at the second character's index.

A possible implementation:

let buildKmpPrefixTable (pattern : string) =
    let prefixTable = Array.zeroCreate pattern.Length

    let rec run startIndex matchCount =
        let writeIndex = startIndex + matchCount
        if writeIndex < pattern.Length then
            if pattern.[writeIndex] = pattern.[matchCount] then
                prefixTable.[writeIndex] <- matchCount
                run startIndex (matchCount + 1)
            else
                prefixTable.[writeIndex] <- matchCount
                run (writeIndex + 1) 0
    run 1 0
    if pattern.Length > 0 then prefixTable.[0] <- -1
    prefixTable

This approach isn't in danger of any endless loops/recursion, because all code paths of run either increase writeIndex in the next iteration or finish iterating.

Note on terminology: the error you are describing in the question is an endless loop or, more generally, non-terminating iteration. Deadlock refers specifically to a situation in which a thread waits for a lock that will never be released because the thread holding it is itself waiting for a lock that will never be released for the same reason.

Upvotes: 3

Related Questions