Wheat Wizard
Wheat Wizard

Reputation: 4219

Why does the second version of this run in exponential time?

I am writing a program to determine if the Levenshtein distance between two strings is exactly 2 in linear time.

I have an algorithm which does this. I use the naive recursive approach which scans the string and when it finds a discrepancy it branches into 3 options trying either to delete, insert or replace. However I make a modification that if we exceed 2 as our total we just give up on that branch. This limits the total number of branches to 9 and makes the algorithm linear.

Here's my initial implementation:

diffIs2 x y =
  2 == go 0 x y
  where
    go total xs@(x1:xs') ys@(y1:ys')
      | x1 == y1
      =
        go total xs' ys'
      | total < 2
      =
        minimum $ zipWith (go $ total + 1)
          [ xs', xs , xs' ]
          [ ys , ys', ys' ]
    go total xs ys =
      total + length xs + length ys

Testing confirms that this runs in linear time.

I also have a second similar program which as far as I can tell should be linear as well.

The idea here is to use short circuiting and lazy evaluation to limit the number of branches evaluated instead. We always allow a branching however, when we branch we take the minimum of the result of each and 3. That way if the branch total exceeds 3 the short circuit should prevent further evaluation.

We have to implement a natural number type for partial evaluation.

data Nat
  = Zero
  | Succ Nat
  deriving
    ( Eq
    , Ord
    )

natLength :: [a] -> Nat
natLength [] =
  Zero
natLength (a : b) =
  Succ (natLength b)

nat3 =
  Succ $ Succ $ Succ Zero

diffIs2 x y =
  Succ (Succ Zero) == go x y
  where
    go xs@(x1:xs') ys@(y1:ys')
      | x1 == y1
      =
        go xs' ys'
      | otherwise
      =
        Succ $
          minimum $
            map (min nat3) $
              zipWith go
                [ xs', xs , xs' ]
                [ ys , ys', ys' ]
    go xs ys =
      natLength $ xs ++ ys

Testing this reveals it doesn't run in linear time. Something makes it exponential but I can't figure out what. The short circuiting does work as expected. We can show this with the following program which halts in finite time because of the short circuiting of min:

f = Succ f

main =
  print $ (Succ Zero) == min (Succ Zero) f

but when put together in the algorithm it doesn't seem to work as I expect.

Why is the second code exponential? What is my misconception?

Upvotes: 6

Views: 311

Answers (1)

Noughtmare
Noughtmare

Reputation: 10695

While the default min is lazy enough for your simple example at the end of your question, it is not as lazy as we would hope:

ghci> let inf = Succ inf
ghci> inf `min` inf `min` Zero == Zero
^CInterrupted.

To fix it we need a lazier min:

min' :: Nat -> Nat -> Nat
min' Zero _ = Zero
min' _ Zero = Zero
min' (Succ x) (Succ y) = Succ (min' x y)

The big difference is that now we can get a partial result before evaluating the arguments completely:

min (Succ undefined) (Succ undefined) === undefined
min' (Succ undefined) (Succ undefined) === Succ (min' undefined undefined)

Now we can use it as follows:

diffIs2 :: Eq a => [a] -> [a] -> Bool
diffIs2 x y = Succ (Succ Zero) == go x y
  where
    go xs@(x1:xs') ys@(y1:ys')
      | x1 == y1 = go xs' ys'
      | otherwise = Succ $ go xs' ys `min'` go xs ys' `min'` go xs' ys'
    go xs ys = natLength $ xs ++ ys

Note that you do not even really need the extra min' nat3, because the top level comparison will only force the first three Succs anyway.

Upvotes: 9

Related Questions