Reputation: 1532
I am a Haskell newbie. I want to read only N characters of a text file into memory. So I wrote this code:
main :: IO()
main = do
inh <- openFile "input.txt" ReadMode
transformedList <- Control.Monad.liftM (take 4) $ transformFileToList inh
putStrLn "transformedList became available"
putStrLn transformedList
hClose inh
transformFileToList :: Handle -> IO [Char]
transformFileToList h = transformFileToListAcc h []
transformFileToListAcc :: Handle -> [Char] -> IO [Char]
transformFileToListAcc h acc = do
readResult <- tryIOError (hGetChar h)
case readResult of
Left e -> if isEOFError e then return acc else ioError e
Right c -> do let acc' = acc ++ [transformChar c]
putStrLn "got char"
unsafeInterleaveIO $ transformFileToListAcc h acc'
My input file several lines, with the first one being "hello world", and when I run this program, I get this output:
got char
transformedList became available
got char
["got char" a bunch of times]
hell
My expectation is that "got char" happens only 4 times. Instead, the entire file is read, one character at a time, and only THEN the first 4 characters are taken.
What am I doing wrong?
Upvotes: 1
Views: 131
Reputation: 7001
The way transformFileToListAcc
is written, you can't work out what the first four characters are going to be until you've done all the IO. To see that, consider this modified form:
transformFileToListAcc :: Handle -> [Char] -> IO [Char]
transformFileToListAcc h acc = do
readResult <- tryIOError (hGetChar h)
case readResult of
Left e -> if isEOFError e then return ("answer: " ++ acc) else ioError e
Right c -> do let acc' = acc ++ [transformChar c]
putStrLn "got char"
unsafeInterleaveIO $ transformFileToListAcc h acc'
The first four characters here are "answ"
, but you don't find that out until you get to the end of the file. If you want to be able to consume the file lazily, you have to commit to returning the first character outside of the recursive call, e.g.
transformFileToList :: Handle -> IO [Char]
transformFileToList h = do
readResult <- tryIOError (hGetChar h)
case readResult of
Left e -> if isEOFError e then return [] else ioError e
Right c -> do putStrLn "got char"
rest <- unsafeInterleaveIO $ transformFileToList h
return (transformChar c : rest)
Now I don't need to do the recursive call to see what the first character of the result is.
(That said, not doing lazy I/O is probably better. Just wanted to also add this answer to the question as stated.)
Upvotes: 1
Reputation: 2753
I acknowledge I don't understand how unsafeInterLeaveIO
works but I suspect the problem here is somehow related to it. Maybe with this example you are trying to understand unsafeInterLeaveIO
, but if I were you I'd try to avoid its direct use. Here is how I'd do it in your particular case.
main :: IO ()
main = do
inh <- openFile "input.txt" ReadMode
charList <- replicateM 4 $ hGetChar inh
let transformedList = map transformChar charList
putStrLn "transformedList became available"
putStrLn transformedList
hClose inh
This should just read the first 4 characters of the file.
If you are looking for a truly effectful streaming solution, I'd look into pipes
or conduit
instead of unsafeInterLeaveIO
.
Upvotes: 2