Why does the second version of this run in exponential time?

Question

I am writing a program to determine if the Levenshtein distance between two strings is exactly 2 in linear time.

I have an algorithm which does this. I use the naive recursive approach which scans the string and when it finds a discrepancy it branches into 3 options trying either to delete, insert or replace. However I make a modification that if we exceed 2 as our total we just give up on that branch. This limits the total number of branches to 9 and makes the algorithm linear.

Here's my initial implementation:

diffIs2 x y =
  2 == go 0 x y
  where
    go total xs@(x1:xs') ys@(y1:ys')
      | x1 == y1
      =
        go total xs' ys'
      | total < 2
      =
        minimum $ zipWith (go $ total + 1)
          [ xs', xs , xs' ]
          [ ys , ys', ys' ]
    go total xs ys =
      total + length xs + length ys

Testing confirms that this runs in linear time.

I also have a second similar program which as far as I can tell should be linear as well.

The idea here is to use short circuiting and lazy evaluation to limit the number of branches evaluated instead. We always allow a branching however, when we branch we take the minimum of the result of each and 3. That way if the branch total exceeds 3 the short circuit should prevent further evaluation.

We have to implement a natural number type for partial evaluation.

data Nat
  = Zero
  | Succ Nat
  deriving
    ( Eq
    , Ord
    )

natLength :: [a] -> Nat
natLength [] =
  Zero
natLength (a : b) =
  Succ (natLength b)

nat3 =
  Succ $ Succ $ Succ Zero

diffIs2 x y =
  Succ (Succ Zero) == go x y
  where
    go xs@(x1:xs') ys@(y1:ys')
      | x1 == y1
      =
        go xs' ys'
      | otherwise
      =
        Succ $
          minimum $
            map (min nat3) $
              zipWith go
                [ xs', xs , xs' ]
                [ ys , ys', ys' ]
    go xs ys =
      natLength $ xs ++ ys

Testing this reveals it doesn't run in linear time. Something makes it exponential but I can't figure out what. The short circuiting does work as expected. We can show this with the following program which halts in finite time because of the short circuiting of min:

f = Succ f

main =
  print $ (Succ Zero) == min (Succ Zero) f

but when put together in the algorithm it doesn't seem to work as I expect.

Why is the second code exponential? What is my misconception?

Why does the second version of this run in exponential time?

Answers (1)

Related Questions