Need help analysing code & profiling results

Question

I am trying to make a function more efficient but I have made it worst and I could not understand why. Could someone see why and explain to me please?

Original function:

substringsSB s = substringsSB' Set.empty s
substringsSB' m s = substrings' m s
  where
    substrings' m s  = {-# SCC "substrings'" #-}if (Set.member s m) then m else foldl' insertInits m (init . B.tails $ s)
    insertInits m s = {-# SCC "insertInits" #-}if (Set.member s m) then m else foldl' doInsert m (tail . B.inits $ s)
    doInsert m k = {-# SCC "doInsert" #-}Set.insert k m

profiling result:

    total time  =        3.14 secs   (157 ticks @ 20 ms)
    total alloc = 1,642,067,360 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

doInsert                       Main                  95.5   92.1
insertInits                    Main                   2.5    7.8
substringsSB'                  Main                   1.9    0.0


                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN                                                   1           0   0.0    0.0   100.0  100.0
 main                    Main                                                 280           1   0.0    0.0   100.0  100.0
  substringsSB           Main                                                 281           1   0.0    0.0   100.0  100.0
   substringsSB'         Main                                                 282           1   1.9    0.0   100.0  100.0
    doInsert             Main                                                 285     1233232  95.5   92.1    95.5   92.1
    insertInits          Main                                                 284        1570   2.5    7.8     2.5    7.8
    substrings'          Main                                                 283           1   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Handle.FD                                     211           3   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Encoding.Iconv                                169           2   0.0    0.0     0.0    0.0
 CAF                     GHC.Conc.Signal                                      166           1   0.0    0.0     0.0    0.0

As far as I know, we cannot have early-exit in a ~~fold~~foldl, so the function could be spending a lot of time just calling Set.member s m and return m in substrings'. So, I converted the function to use recursion:

substringsSB s = substringsSB' Set.empty s
substringsSB' m str = substrings' m (init . B.tails $ str)
  where
    substrings' m [] = m
    substrings' m (s:ss) | Set.member s m = m
                         | otherwise      = {-# SCC "substrings'" #-}substrings' insertTail ss
                         where insertTail = insertInits m $ reverse $ (tail . B.inits $ s)
    insertInits m [] = m
    insertInits m (s:ss) | Set.member s m = m
                         | otherwise      = {-# SCC "insertInits" #-}insertInits (doInsert s m) ss
    doInsert k m = {-# SCC "doInsert" #-}Set.insert k m

profiling result:

    total time  =        5.16 secs   (258 ticks @ 20 ms)
    total alloc = 1,662,535,200 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

doInsert                       Main                  54.7   90.5
substringsSB'                  Main                  43.8    9.5
insertInits                    Main                   1.6    0.0


                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN                                                   1           0   0.0    0.0   100.0  100.0
 main                    Main                                                 280           1   0.0    0.0   100.0  100.0
  substringsSB           Main                                                 281           1   0.0    0.0   100.0  100.0
   substringsSB'         Main                                                 282           1  43.8    9.5   100.0  100.0
    doInsert             Main                                                 285     1225600  54.7   90.5    54.7   90.5
    insertInits          Main                                                 284     1225600   1.6    0.0     1.6    0.0
    substrings'          Main                                                 283        1568   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Handle.FD                                     211           3   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Encoding.Iconv                                169           2   0.0    0.0     0.0    0.0
 CAF                     GHC.Conc.Signal                                      166           1   0.0    0.0     0.0    0.0

But this take more time than the original version. Why it's spending so much time in substringsSB'? It's only doing init . B.tails $ str which the original version also call... Or have I made a mistake and these two functions are not logically equivalent?

main = do
  s <- getLine
  let m = substringsSB $ B.pack s
  print $ Set.size m
  return ()

with input:

asjasdfkjasdfjkasdjlflaasdfjklajsdflkjasvdadufhsaodifkljaiduhfjknhdfasjlkdfndbhfisjglkasnjjfgklsadmsjnhsjdflkmsnajjkdlsmfnjsdkfljasd;fjlkasdjfklasjdfnasdfjjnsadfjsadfhasjdfjlaksdfjlkasdfjljkasdflasidfjlaisjdflaisdjflaisjdfliasjdgfouqhagdfsia;klsjdfnklajsdfkhkasfhjdasdfhaskdflhjaklsdfh;kjlasdfh;jlaksdflkhajsdfkjahsdfkjhasdfkkasdfkjlkasfdkljasdfkhljkasdkflkjasdfasdlfkajsdlfkjaslkdfjjaksdjgujhgjhghjbjnbghjghhgfghfghvfgfgjhgjhdfjfjhgfjgvjhgvjhgvjhgvjhgvjhgvjhasdkfjkasdjfklajsdfklkahsdfjklhjklhghjhkhgfvcghjkjhghjkjhhvjkl/ljklkjlkjlkjlkjaslkdfjasd;lkfjas;dlfkjas;dflkjas;dflkjas;dflkjas;dflkja;slkdfja;sdlkjfa;sdlkfja;lsdfkjas;ldkfja;sdlkfja;skldfja;slkdjfa;slkdfja;sdklfjas;dlkfjas;dklfjas;dlkfjas;dfkljas;dflkjas;lkdfja;sldkfj;aslkdfja;sldkfja;slkdfj;alksdjf;alsdkfj;alsdkfja;sdflkja;sdflkja;sdlfkja;sdlfkja;sldkfja;sdlkfja;sldfkj;asldkfja;sldkfja;lsdkfja;sldfkja;sdlfjka;sdlfjkas;dlkfjas;ldkfjas;dlfkjasfd;lkjasd;fljkads;flkjasdf;lkjasdf;lkajsdf;lkajsdf;aksljdf;alksjdfa;slkdjfa;slkdjfa;slkdfja;sdflkjas;dflkjasd;flkjasd;flkjasdf;lkjasdf;ljkasdf;lkajdsf;laksjf;asldfkja;sdfljkads;flkjasd;fljkasdf;lkjasdf;ljkadfs;fljkadfs;ljkasdf;lajksdf;lkajsdf;lajsfd;laksdfgvjhgvjhgvjhcfjhgcjfgvjkgvjjgfjghfhgkhkjhbkjhbkjhbkybkkugtkydfktyufctkyckxckghfvkuygjkhbykutgtvkyckjhbliuhgktuyfkvuyjbjkjygvkuykjdjflaksdjflkajsdlkfjalskdjflkasjdflkjasdlkfjalksdjfklajsdflkjasdlkjfalksdjflkasjdflkjasdlfkjaslkdjflaksjdflkajsdlfkjasdlkfjalsdjflkasjdflkasjdflajsdfjsfuhaduvasdyhaweuisfnaysdfiuhasfdnhaksjdfahsdfiujknsadfhbaiuhdfjknahbdshfjksnashdfkjnsadfiukjfnhsdfkjnasdfikjansdfhnaksdjfaisdfkn

Daniel Fischer · Accepted Answer

The sad truth is that Set.member is expensive too.

In the first version, you check for each tail if it has been seen before and if so, ignore it, otherwise insert all its nonempty inits. If the input is sufficiently irregular, that's O(n) membership tests and O(n^2) inserts, altogether O(n^2*log n) (assuming O(1) average cost for the comparisons). If the input is periodic with shortest (positive) period p, only the first p tails lead to inserts, so that's O(n) tests and O(p*n) inserts, O(p*n*log n) overall (that's a bit cheated, the average cost for comparisons could be up to O(p) if p > 1 and O(n) if p == 1, but if the period itself is irregular, O(1) for the comparisons is okay).

In the second,

substringsSB s = substringsSB' Set.empty s
substringsSB' m str = substrings' m (init . B.tails $ str)
  where
    substrings' m [] = m
    substrings' m (s:ss) | Set.member s m = m
                         | otherwise      = substrings' insertTail ss
                           where
                             insertTail = insertInits m $ reverse $ (tail . B.inits $ s)

you check for each tail if it has been seen before, if so stop. That's good, but doesn't gain much over the first In the first, if a tail has been seen before, all further tails have also been seen before, so you only skip at most O(n) membership tests, O(n*log n) operations. For normally irregular input, only a few of the shortest tails have been seen before, so only few tests are skipped - very little gain.

    insertInits m [] = m
    insertInits m (s:ss) | Set.member s m = m
                         | otherwise      = insertInits (doInsert s m) ss
    doInsert k m = {-# SCC "doInsert" #-}Set.insert k m

If the tail hasn't been seen yet (normal), you start inserting its inits - from longest to shortest - breaking if any has been seen before (because then all shorter inits have also been seen before). That's great if many long inits occur multiple times, but if not, all you have is O(n^2) additional membership tests.

For ordinary irregular input no long substrings occur multiple times, but a number of short ones do, and the few inserts saved do not compensate for the additional membership tests, rendering the second method slower by a constant factor. (Membership testing is cheaper than insertion, so the factor should be less than 2.)

For periodic input, the first method also avoids unnecessary inserts, the second saves O(n) tests in the outer loop, but adds O(p*n) tests in the inner loop, making it slightly worse than in the irregular case.

But for some inputs, the second method can be dramatically better. Try both for

main = do
    let x = substringsSB $ B.pack $ replicate 9999 97 ++ [98]
    print (Set.size x)

You can improve the second version by replacing the expensive member before the insert with a cheap size comparison after the insert,

substringsSB str = go 0 Set.empty (init $ B.tails str)
  where
    go sz m (s:ss)
        | Set.member s m = m
        | otherwise      = go nsz nm ss
          where
            (nsz,nm) = insInits sz m (reverse . tail $ B.inits s)
    go _ m [] = m
    insInits sz m (s:ss)
        | sz1 == sz     = (sz,m)
        | otherwise     = insInits sz1 nm ss
          where
            nm = Set.insert s m
            sz1 = Set.size nm
    insInits sz m [] = (sz,m)

That brings it close to the first version in the generic case, makes it slightly better (here) than the first version for concat $ replicate n "abcde" and much better for the evil example above.

Need help analysing code & profiling results

Answers (1)

Related Questions

Need help analysing code &amp; profiling results

Answers (1)

Related Questions

Need help analysing code & profiling results