thor
thor

Reputation: 22520

How to reuse efficiently input from stdin in Haskell

I understand that I should not try to re-read from stdin because of errors about Haskell IO - handle closed For example, in below:

main = do
  x <- getContents
  putStrLn $ map id x
  x <- getContents     --problem line
  putStrLn x

the second call x <- getContents will cause the error:

test: <stdin>: hGetContents: illegal operation (handle is closed)

Of course, I can omit the second line to read from getContents.

main = do
  x <- getContents
  putStrLn $ map id x
  putStrLn x

But will this become a performance/memory issue? Will GHC have to keep all of the contents read from stdin in the main memory?

I imagine the first time around when x is consumed, GHC can throw away the portions of x that are already processed. So theoretically, GHC could only use a small amount of constant memory for the processing. But since we are going to use x again (and again), it seems that GHC cannot throw away anything. (Nor can it read again from stdin).

Is my understanding about the memory implications here correct? And if so, is there a fix?

Upvotes: 1

Views: 540

Answers (1)

melpomene
melpomene

Reputation: 85827

Yes, your understanding is correct: If you reuse x, ghc has to keep it all in memory.

I think a possible fix is to consume it lazily (once).

Let's say you want to output x to several output handles hdls :: [Handle]. The naive approach is:

main :: IO ()
main = do
    x <- getContents
    forM_ hdls $ \hdl -> do
        hPutStr hdl x

This will read stdin into x as the first hPutStr traverses the string (at least for unbuffered handles, hPutStr is simply a loop that calls hPutChar for each character in the string). From then on it'll be kept in memory for all following hdls.

Alternatively:

main :: IO ()
main = do
    x <- getContents
    forM_ x $ \c -> do
        forM_ hdls $ \hdl -> do
            hPutChar hdl c

Here we've transposed the loops: Instead of iterating over the handles (and for each handle iterating over the input characters), we iterate over the input characters, and for each character, we print it to each handle.

I haven't tested it, but this form should guarantee that we don't need a lot of memory because each input character c is used once and then discarded.

Upvotes: 2

Related Questions