Philip
Philip

Reputation: 1532

I try for lazy I/O, but entire file is consumed

I am a Haskell newbie. I want to read only N characters of a text file into memory. So I wrote this code:

main :: IO()
main = do
  inh <- openFile "input.txt" ReadMode
  transformedList <- Control.Monad.liftM (take 4) $ transformFileToList inh
  putStrLn "transformedList became available"
  putStrLn transformedList
  hClose inh

transformFileToList :: Handle -> IO [Char]
transformFileToList h = transformFileToListAcc h []

transformFileToListAcc :: Handle -> [Char] -> IO [Char]
transformFileToListAcc h acc = do
  readResult <- tryIOError (hGetChar h)
  case readResult of
    Left e -> if isEOFError e then return acc else ioError e
    Right c -> do let acc' = acc ++ [transformChar c]
                  putStrLn "got char"
                  unsafeInterleaveIO $ transformFileToListAcc h acc'

My input file several lines, with the first one being "hello world", and when I run this program, I get this output:

got char
transformedList became available
got char
["got char" a bunch of times]
hell

My expectation is that "got char" happens only 4 times. Instead, the entire file is read, one character at a time, and only THEN the first 4 characters are taken.

What am I doing wrong?

Upvotes: 1

Views: 131

Answers (2)

Ben Millwood
Ben Millwood

Reputation: 7001

The way transformFileToListAcc is written, you can't work out what the first four characters are going to be until you've done all the IO. To see that, consider this modified form:

transformFileToListAcc :: Handle -> [Char] -> IO [Char]
transformFileToListAcc h acc = do
  readResult <- tryIOError (hGetChar h)
  case readResult of
    Left e -> if isEOFError e then return ("answer: " ++ acc) else ioError e
    Right c -> do let acc' = acc ++ [transformChar c]
                  putStrLn "got char"
                  unsafeInterleaveIO $ transformFileToListAcc h acc'

The first four characters here are "answ", but you don't find that out until you get to the end of the file. If you want to be able to consume the file lazily, you have to commit to returning the first character outside of the recursive call, e.g.

transformFileToList :: Handle -> IO [Char]
transformFileToList h = do
  readResult <- tryIOError (hGetChar h)
  case readResult of
    Left e -> if isEOFError e then return [] else ioError e
    Right c -> do putStrLn "got char"
                  rest <- unsafeInterleaveIO $ transformFileToList h
                  return (transformChar c : rest)

Now I don't need to do the recursive call to see what the first character of the result is.

(That said, not doing lazy I/O is probably better. Just wanted to also add this answer to the question as stated.)

Upvotes: 1

Danny Navarro
Danny Navarro

Reputation: 2753

I acknowledge I don't understand how unsafeInterLeaveIO works but I suspect the problem here is somehow related to it. Maybe with this example you are trying to understand unsafeInterLeaveIO, but if I were you I'd try to avoid its direct use. Here is how I'd do it in your particular case.

main :: IO ()
main = do
    inh <- openFile "input.txt" ReadMode
    charList <- replicateM 4 $ hGetChar inh
    let transformedList = map transformChar charList
    putStrLn "transformedList became available"
    putStrLn transformedList
    hClose inh

This should just read the first 4 characters of the file.

If you are looking for a truly effectful streaming solution, I'd look into pipes or conduit instead of unsafeInterLeaveIO.

Upvotes: 2

Related Questions