Reputation: 53
This is a classic programming problem https://en.wikipedia.org/wiki/Longest_common_subsequence_problem
The JS implementation passes all the tests but the Haskell one consumes too much memory and gets killed.
What am I doing wrong?
-- TOP TO BOTTOM
commonChild s1 s2 = L.foldl g l1 l ! n
where
n = length s1
l1 = arr $ replicate (n + 1) 0
l = [ [(x,i,y,j) | (y,j) <- zip s2 [1..]]
| (x,i) <- zip s1 [1..]]
g a = L.foldl (\a' (x,i,y,j) -> let x' = if x == y
then 1 + a ! (j - 1)
else max (a ! j) (a' ! (j - 1))
in a' // [(j,x')])
l1
arr l = array (0,length l-1) $ zip [0..] l
function lcs(a,b) {
let n = a.length
let a1 = []
for (let i = 0; i <= n; i++) {
a1.push(0)
}
for (let i = 0; i < b.length; i++) {
let a2 = [0]
for (let j = 0; j < n; j++) {
let x = b[i] == a[j] ? 1 + a1[j] : Math.max(a1[j+1],a2[j])
a2.push(x)
}
a1 = a2
}
return a1[n]
}
console.log(lcs("SHINCHAN","NOHARAAA"))
https://repl.it/@leonbobster/LCS#main.hs
https://www.hackerrank.com/challenges/common-child/problem
Upvotes: 2
Views: 186
Reputation: 48611
Reading the Wikipedia description of the algorithm led me quite directly to an implementation using only lists; no arrays:
{-# LANGUAGE BangPatterns #-}
-- Calculate the next row from the character along the
-- left edge, the string along the top edge, and the previous
-- row.
makeRow :: Char -> String -> [Int] -> [Int]
makeRow match = go 0 0
where
-- The first arguments are the values in the arguments
-- to the upper left and immediate left of the current
-- cell.
go :: Int -> Int -> Char -> String -> [Int] -> [Int]
go !up_left !left (c:cs) (l:ls) =
cur : go l cur cs ls
where
!cur
| c == match = 1 + up_left
| otherwise = max l left
go _ _ _ _ = []
commonChild s1 = go (repeat (0 :: Int))
where
go ls [] = last ls
go ls (c:cs) = go (makeRow c s1 ls) cs
This is fast enough to pass all the tests, and it's a lot simpler than mucking with arrays. Constant factors could be improved in various ways, but this is a good place to start. The first way I'd try to improve this is to replace [Int]
everywhere with a type that looks like
data IntList = Cons !Int IntList | Nil
This saves two words of memory and one pointer indirection per element. Switching to unboxed arrays (at least for the Int
lists) should give further improvements in many cases, but it'll be considerably more annoying.
Upvotes: 2
Reputation: 3924
Your use of //
from Data.Array
is really killing your performance. If you read the docs, it says it "Constructs an array identical to the first argument except that it has been updated by the associations in the right argument", which means that every time you call this, you're constructing a brand new array. This is very different from your js implementation, which simply appends.
You may think that arrays are the obvious choice for getting a performance boost, but this is one of those times where regular old lists will do just fine. Rather than generate a new array on every iteration of your fold, each with one new element over the previous, you can just cons onto a list. Consider the following definition of your sub-function g
:
g a = arr . reverse . L.foldl (inner a) [0]
inner a a'@(z:_) (x,i,y,j) =
let x' = if x == y
then 1 + a ! (j - 1)
else max (a ! j) z
in x':a'
Note: The changes I made above were all about choosing a better data structure, but see @chi's answer for more ways to improve performance having to do with negotiating laziness/strictness and doing GHC-specific things.
Upvotes: 5
Reputation: 116164
I slightly modified your code by
foldl'
-O2
(avoid GHCi)Here's the modified code (with the long test strings removed):
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS -Wall -O2 #-}
module Main where
import qualified Data.List as L
import Data.Array
commonChild :: Eq a => [a] -> [a] -> Int
commonChild s1 s2 = L.foldl' g l1 l ! n
where
n = length s1
l1 = arr $ replicate (n + 1) 0
l = [[(x,i,y,j) | (y,j) <- zip s2 [1..]] | (x,i) <- zip s1 [(1::Int)..]]
g a = L.foldl' (\ !a' (!x,!_i,!y,!j) -> let
! x' = if x == y
then 1 + a ! (j - 1)
else max (a ! j) (a' ! (j - 1)) in a' // [(j,x')]) l1
arr :: [e] -> Array Int e
arr l = array (0,length l-1) $ zip [0..] l
s1test :: String
s1test = "UBBJXJGKLXGXTFBJ..." -- omitted
s2test :: String
s2test = "WZFPTGLCXK..." -- omitted
main :: IO ()
main = do
print $ commonChild "SHINCHAN" "NOHARAAA"
print $ commonChild s1test s2test
The above code ran using under 6MB of RAM, and completed in 3m10s printing 3
and 1417
as output.
By comparison, the original code was using 12GB+ of RAM when I terminated it.
There should be more room for improvement. The arrays in Data.Array
can be slow, since each array update has to re-create a new array. When the imperative algorithm can not easily be translated into a nice functional one, perhaps it's better to embrace the imperative side for a moment and start using STUArray
and its associated functions, writing some code which precisely mimics the imperative one you posted. Using runST
, you can still achieve a pure functional interface and expose the a similar type
commonChild ::
( Eq a
, forall s. MArray (STUArray s) a (ST s) -- requires some extension
) => [a] -> [a] -> Int
(or simply give up polymorphism and use String -> String -> Int
).
Upvotes: 3