user855443
user855443

Reputation: 2950

breadth-first traversal of directory tree is not lazy

I try to traverse the directory tree. A naive depth-first traversal seems not to produce the data in a lazy fashion and runs out of memory. I next tried a breadth first approach, which shows the same problem - it uses all the memory available and then crashes.

the code I have is:

getFilePathBreadtFirst :: FilePath -> IO [FilePath]
getFilePathBreadtFirst fp = do
  fileinfo <- getInfo fp 
  res :: [FilePath]  <- if isReadableDirectory fileinfo
          then do
                children  <- getChildren fp 
                lower    <-  mapM getFilePathBreadtFirst children  
                return (children ++  concat lower)
           else return [fp]        -- should only return the files? 
  return res 

getChildren :: FilePath -> IO [FilePath]
getChildren path = do 
          names <- getUsefulContents path
          let namesfull = map (path </>) names
          return namesfull

testBF fn = do  -- crashes for /home/frank, does not go to swap 
  fps <- getFilePathBreadtFirst fn
  putStrLn $ unlines fps

I think all the code is either linear or tail recursive, and I would expect that the listing of filenames starts immediately, but in fact it does not. Where is the error in my code and my thinking? Where have I lost lazy evaluation?

Upvotes: 8

Views: 703

Answers (2)

user855443
user855443

Reputation: 2950

i have separated the managemetn of the pipe and the tree traversal. here first the code for the pipe (essentially the code of gonzales - thank you!):

traverseTree :: FilePath -> Producer FilePath IO ()
-- ^ traverse a tree in breadth first fashion using an external doBF function 
traverseTree path = (`evalStateT` empty) $ loop $ do
-- All code past this point uses the following monad transformer stack:
-- MaybeT (StateT (Seq FilePath) (Producer FilePath IO)) ()
let liftState = lift
    liftPipe  = lift . lift
    liftIO    = lift . lift . lift
liftState $ modify (|> path)
forever $ do
    x <- liftState $ gets viewl
    case x of
        EmptyL    -> quit
        file :< s -> do
            (yieldval, nextInputs) <- liftIO $ doBF file 
            liftState $ put s
            liftPipe $ yield yieldval
            liftState $ forM_ nextInputs $ \name -> modify (|> name)

next the code for the tree traversal:

doBF :: FilePath -> IO (FilePath, [FilePath])
doBF file = do 
    finfo <- getInfo file
    let p =  isReadableDirectoryNotLink finfo
    namesRes <- if p then do
        names :: [String] <- liftIO $ getUsefulContents file
        let namesSorted = sort names
        let namesfull = map (file </>) namesSorted
        return namesfull
        else return []          
    return (file, namesRes) 

I hope to replace doBF with a similar function to traverse depth first. i assumed that i could make traverseTree more general and not only for FilePath ~ String, but i do not see in which class the empty function on sequences is. could be generally useful.

Upvotes: 0

Gabriella Gonzalez
Gabriella Gonzalez

Reputation: 35099

I will use three separate tricks to solve your question.

  • Trick 1: Use the pipes library to stream file names concurrent with traversing the tree.
  • Trick 2: Use the StateT (Seq FilePath) transformer to achieve a breadth-first traversal.
  • Trick 3: Use the MaybeT transformer to avoid manual recursion when writing the loop and exit.

The following code combines these three tricks in one monad transformer stack.

import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Maybe
import Control.Monad.State.Lazy
import Control.Pipe
import Data.Sequence
import System.FilePath.Posix
import System.Directory

loop :: (Monad m) => MaybeT m a -> m ()
loop = liftM (maybe () id) . runMaybeT . forever

quit :: (Monad m) => MaybeT m a
quit = mzero

getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
  = fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path

permissible :: FilePath -> IO Bool
permissible file
  = fmap (\p -> readable p && searchable p) $ getPermissions file

traverseTree :: FilePath -> Producer FilePath IO ()
traverseTree path = (`evalStateT` empty) $ loop $ do
    -- All code past this point uses the following monad transformer stack:
    -- MaybeT (StateT (Seq FilePath) (Producer FilePath IO)) ()
    let liftState = lift
        liftPipe  = lift . lift
        liftIO    = lift . lift . lift
    liftState $ modify (|> path)
    forever $ do
        x <- liftState $ gets viewl
        case x of
            EmptyL    -> quit
            file :< s -> do
                liftState $ put s
                liftPipe $ yield file
                p <- liftIO $ doesDirectoryExist file
                when p $ do
                    names <- liftIO $ getUsefulContents file
                    -- allowedNames <- filterM permissible names
                    let namesfull = map (path </>) names
                    liftState $ forM_ namesfull $ \name -> modify (|> name)

This creates a generator of breadth-first file names that can be consumed concurrent with the tree traversal. You consume the values using:

printer :: (Show a) => Consumer a IO r
printer = forever $ do
    a <- await
    lift $ print a

>>> runPipe $ printer <+< traverseTree path
<Prints file names as it traverses the tree>

You can even choose to not demand all the values:

-- Demand only 'n' elements
take' :: (Monad m) => Int -> Pipe a a m ()
take' n = replicateM_ n $ do
    a <- await
    yield a

>> runPipe $ printer <+< take' 3 <+< traverseTree path
<Prints only three files>

More importantly, that last example will only traverse the tree as much as necessary to generate the three files and then it will stop. This prevents wastefully traversing the entire tree when all you wanted was 3 results!

To learn more about the pipes library trick, consult the pipes tutorial at Control.Pipes.Tutorial.

To learn more about the loop trick, read this blog post.

I couldn't find a good link for the queue trick for breadth first traversal, but I know it's out there somewhere. If somebody else knows a good link for this, just edit my answer to add it.

Upvotes: 7

Related Questions