Reputation: 4219
I am writing a program to determine if the Levenshtein distance between two strings is exactly 2 in linear time.
I have an algorithm which does this. I use the naive recursive approach which scans the string and when it finds a discrepancy it branches into 3 options trying either to delete, insert or replace. However I make a modification that if we exceed 2 as our total we just give up on that branch. This limits the total number of branches to 9 and makes the algorithm linear.
Here's my initial implementation:
diffIs2 x y =
2 == go 0 x y
where
go total xs@(x1:xs') ys@(y1:ys')
| x1 == y1
=
go total xs' ys'
| total < 2
=
minimum $ zipWith (go $ total + 1)
[ xs', xs , xs' ]
[ ys , ys', ys' ]
go total xs ys =
total + length xs + length ys
Testing confirms that this runs in linear time.
I also have a second similar program which as far as I can tell should be linear as well.
The idea here is to use short circuiting and lazy evaluation to limit the number of branches evaluated instead. We always allow a branching however, when we branch we take the minimum of the result of each and 3. That way if the branch total exceeds 3 the short circuit should prevent further evaluation.
We have to implement a natural number type for partial evaluation.
data Nat
= Zero
| Succ Nat
deriving
( Eq
, Ord
)
natLength :: [a] -> Nat
natLength [] =
Zero
natLength (a : b) =
Succ (natLength b)
nat3 =
Succ $ Succ $ Succ Zero
diffIs2 x y =
Succ (Succ Zero) == go x y
where
go xs@(x1:xs') ys@(y1:ys')
| x1 == y1
=
go xs' ys'
| otherwise
=
Succ $
minimum $
map (min nat3) $
zipWith go
[ xs', xs , xs' ]
[ ys , ys', ys' ]
go xs ys =
natLength $ xs ++ ys
Testing this reveals it doesn't run in linear time. Something makes it exponential but I can't figure out what. The short circuiting does work as expected. We can show this with the following program which halts in finite time because of the short circuiting of min
:
f = Succ f
main =
print $ (Succ Zero) == min (Succ Zero) f
but when put together in the algorithm it doesn't seem to work as I expect.
Why is the second code exponential? What is my misconception?
Upvotes: 6
Views: 311
Reputation: 10695
While the default min
is lazy enough for your simple example at the end of your question, it is not as lazy as we would hope:
ghci> let inf = Succ inf
ghci> inf `min` inf `min` Zero == Zero
^CInterrupted.
To fix it we need a lazier min
:
min' :: Nat -> Nat -> Nat
min' Zero _ = Zero
min' _ Zero = Zero
min' (Succ x) (Succ y) = Succ (min' x y)
The big difference is that now we can get a partial result before evaluating the arguments completely:
min (Succ undefined) (Succ undefined) === undefined
min' (Succ undefined) (Succ undefined) === Succ (min' undefined undefined)
Now we can use it as follows:
diffIs2 :: Eq a => [a] -> [a] -> Bool
diffIs2 x y = Succ (Succ Zero) == go x y
where
go xs@(x1:xs') ys@(y1:ys')
| x1 == y1 = go xs' ys'
| otherwise = Succ $ go xs' ys `min'` go xs ys' `min'` go xs' ys'
go xs ys = natLength $ xs ++ ys
Note that you do not even really need the extra min' nat3
, because the top level comparison will only force the first three Succ
s anyway.
Upvotes: 9