atlantis
atlantis

Reputation: 3126

Iterating over files using Haskell

I have a Haskell function that operates on a single file to produce a map. I want to iterate over all files in a directory and apply this function to produce a single map.

I am trying to approach it this way:

perFileFunc :: Int -> FilePath -> IO (Map.Map [Char] Double)

allFilesIn dir =  filter (/= "..")<$>(filter(/= ".")<$>(getDirectoryContents dir)

This gives me a list of all filenames in a directory except . and ..

Now when I try and do

myFunc dir n = map (perFileFunc n) <$> allFilesIn dir

It typechecks but doesn't do anything. I was expecting a list of maps which I would join using a unionWith (+) perhaps.

This doesn't seem to be the right way to do this.

Upvotes: 2

Views: 1209

Answers (2)

Dan Burton
Dan Burton

Reputation: 53665

The tricky thing to understand about Haskell is how to recognize and compose IO actions. Let's look at some type signatures.

dir :: FilePath
allFilesIn :: FilePath -> IO [FilePath]
perFileFunc :: Int -> FilePath -> IO (Map.Map [Char] Double)    

Now then, you said that for myFunc:

I was expecting a list of maps

So for that function, you want the type signature

myFunc :: Int -> FilePath -> [Map.Map String Double]

Of course, the return type can't be just [Map.Map String Double], because we need to perform some IO in order to evaluate myFunc. So given an Int and a FilePath, we actually want the return type to be an IO action that produces a [Map.Map String Double]:

myFunc :: Int -> FilePath -> IO [Map.Map String Double]

Now then, let's look at the IO actions we will be composing to create this function.

allFilesIn dir :: IO [FilePath]
perFileFunc n  :: FilePath -> IO (Map.Map String Double)

perFileFunc isn't actually an IO action, but it is a function that, given a FilePath, produces an IO action. So let's see...if we run the allFilesIn action, then we can work with that list and run perFileFunc n on each of its elements.

myFunc dir n = do
  files <- allFilesIn dir
  ???

So what goes in the ??? spot? We have a [FilePath] at our disposal, since we used <- to run the action allFilesIn dir. And we have a function, perFileFunc n :: FilePath -> IO (Map.Map String Double). And we want the result to have the type IO [Map.Map String Double].

Stop...Hoogle time! Generalizing the components we have (a = FilePath, b = Map.Map String Double), we hoogle for [a] -> (a -> IO b) -> IO [b] (pretending we haven't seen ehird's answer yet). Lo and behold, mapM is the magical solution we were looking for! (or forM, which is just flip mapM)

myFunc dir n = do
  files <- allFilesIn dir
  mapM (perFileFunc n) files

If you desugar this do notation, you'll find it reduces to ehird's answer:

myFunc dir n = allFilesIn dir >>= (\files -> mapM (perFileFunc n) files)
-- eta reduce (\x -> f x) ==> f
myFunc dir n = allFilesIn dir >>= mapM (perFileFunc n)

Upvotes: 2

ehird
ehird

Reputation: 40787

Your code doesn't work properly because (<$>) is for lifting pure actions into an monadic (actually applicative) context, so your myFunc dir n is of type IO [IO (Map.Map [Char] Double)]; an IO action that, when executed, finds the list of files in a directory, and maps each one to another IO action that, when executed, produces the Map you want — without actually executing any of them. That's probably not what you want :)

You want to execute a function returning a monadic action over every element of a list, and return a list of the resulting values. That's what mapM does:

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]

So what you really want is:

myFunc dir n = allFilesIn dir >>= mapM (perFileFunc n)

You use (>>=) because allFilesIn dir is a monadic action itself, and you want to pass it to a function expecting its result type and returning another action (in this case, mapM).

Note that mapM is unlike map in that, in IO (not every monad behaves like this, but most do), it will execute every action before returning the list; this means that the result of every action must collectively fit into memory, and you won't be able to process the results incrementally. If you want that, you'll need something other than mapM, such as iteratees.

Upvotes: 6

Related Questions