Reputation:

Why is sequence [getLine, getLine, getLine] not evaluated lazily?

main = do
  input <- sequence [getLine, getLine, getLine]
  mapM_ print input

Let's see this program in action:

m@m-X555LJ:~$ runhaskell wtf.hs
asdf
jkl
powe
"asdf"
"jkl"
"powe"

Surprisingly to me, there seems to be no laziness here. Instead, all 3 getLines are evaluated eagerly, the read values are stored in memory and then, not before, all are printed.

Compare to this:

main = do
  input <- fmap lines getContents
  mapM_ print input

Let's see this in action:

m@m-X555LJ:~$ runhaskell wtf.hs
asdf
"asdf"
lkj
"lkj"
power
"power"

Totally different stuff. Lines are read one by one and printed one by one. Which is odd to me because I don't really see any differences between these two programs.

From LearnYouAHaskell:

When used with I/O actions, sequenceA is the same thing as sequence! It takes a list of I/O actions and returns an I/O action that will perform each of those actions and have as its result a list of the results of those I/O actions. That's because to turn an [IO a] value into an IO [a] value, to make an I/O action that yields a list of results when performed, all those I/O actions have to be sequenced so that they're then performed one after the other when evaluation is forced. You can't get the result of an I/O action without performing it.

I'm confused. I don't need to perform ALL IO actions to get the results of just one.

A few paragraphs earlier the book shows a definition of sequence:

sequenceA :: (Applicative f) => [f a] -> f [a]  
sequenceA [] = pure []  
sequenceA (x:xs) = (:) <$> x <*> sequenceA xs

Nice recursion; nothing here hints me that this recursion should not be lazy;just like in any other recursion, to get the head of the returned list Haskell doesn't have to go down through ALL steps of recursion!

Compare:

rec :: Int -> [Int]
rec n = n:(rec (n+1))

main = print (head (rec 5))

In action:

m@m-X555LJ:~$ runhaskell wtf.hs
5
m@m-X555LJ:~$

Clearly, the recursion here is performed lazily, not eagerly.

Then why is the recursion in the sequence [getLine, getLine, getLine] example performed eagerly?

As to why it is important that IO actions are run in order regardless of the results: Imagine an action createFile :: IO () and writeToFile :: IO (). When I do a sequence [createFile, writeToFile] I'd hope that they're both done and in order, even though I don't care about their actual results (which are both the very boring value ()) at all!

I'm not sure how this applies to this Q.

Maybe I'll word my Q this way...

In my mind this:

do
    input <- sequence [getLine, getLine, getLine]
    mapM_ print input

should detoriate to something like this:

do
    input <- do
       input <- concat ( map (fmap (:[])) [getLine, getLine, getLine] )
       return input
    mapM_ print input

Which, in turn, should detoriate to something like this (pseudocode, sorry):

do
    [ perform print on the result of getLine,
      perform print on the result of getLine,
      perform print on the result of getLine
    ] and discard the results of those prints since print was applied with mapM_ which discards the results unlike mapM

Upvotes: 9

Answers (2)

Cubic

Reputation: 15703

getContents is lazy, getLine isn't. Lazy IO isn't a feature of Haskell per se, it's a feature of some particular IO actions.

I'm confused. I don't need to perform ALL IO actions to get the results of just one.

Yes you do! That is one of the most important features of IO, that if you write a >> b or equivalently,

do a
   b

then you can be sure that a is definitely "run" before b (see footnote). getContents is actually the same, it "runs" before whatever comes after it... but the result it returns is a sneaky result that sneakily does more IO when you try to evaluate it. That is actually the surprising bit, and it can lead to some very interesting results in practice (like the file you're reading the contents of being deleted or changed while you're processing the results of getContents), so in practical programs you probably shouldn't be using it, it mostly exists for convenience in programs where you don't care about such things (Code Golf, throwaway scripts or teaching for instance).

As to why it is important that IO actions are run in order regardless of the results: Imagine an action createFile :: IO () and writeToFile :: IO (). When I do a sequence [createFile, writeToFile] I'd hope that they're both done and in order, even though I don't care about their actual results (which are both the very boring value ()) at all!

Addressing the edit:

should detoriate to something like this:

do
    input <- do
       input <- concat ( map (fmap (:[])) [getLine, getLine, getLine] )
       return input
    mapM_ print input

No, it actually turns into something like this:

do 
  input <- do
    x <- getLine
    y <- getLine
    z <- getLine
    return [x,y,z]
  mapM_ print input

(the actual definition of sequence is more or less this:

sequence [] = return []
sequence (a:as) = do
  x <- a
  fmap (x:) $ sequence as

Upvotes: 6

chi

Reputation: 116174

Technically, in

sequenceA (x:xs) = (:) <$> x <*> sequenceA xs

we find <*>, which first runs the action on the left, then the action on the right, and finally applies their result together. This is what makes the first effect in the list to be occur first, and so on.

Indeed, on monads, f <*> x is equivalent to

do theF <- f
   theX <- x
   return (theF theX)

More in general, note that all the IO actions are generally executed in order, first to last (see below for a few rare exceptions). Doing IO in a completely lazy way would be a nightmare for the programmer. For instance, consider:

do let aX = print "x" >> return 4
       aY = print "y" >> return 10
   x <- aX
   y <- aY
   print (x+y)

Haskell guarantees that the output is x y 14, in that order. If we had completely lazy IO we could also get y x 14, depending on which argument is forced first by +. In such case, we would need to know exactly the order in which the lazy thunks are demanded by every operation, which is something the programmer definitely does not want to care about. Under such detailed semantics, x + y is no longer equivalent to y + x, breaking equational reasoning in many cases.

Now, if we wanted to force IO to be lazy we could use one of the forbidden functions, e.g.

do let aX = unsafeInterleaveIO (print "x" >> return 4)
       aY = unsafeInterleaveIO (print "y" >> return 10)
   x <- aX
   y <- aY
   print (x+y)

The above code makes aX and aY lazy IO actions, and the order of the output is now at the whim of the compiler and the library implementation of +. This is in general dangerous, hence the unsafeness of lazy IO.

Now, about the exceptions. Some IO actions which only read from the environment, like getContents were implemented with lazy IO (unsafeInterleaveIO). The designers felt that for such reads, lazy IO can be acceptable, and that the precise timing of the reads is not that important in many cases.

Nowadays, this is controversial. While it can be convenient, lazy IO can be too unpredictable in many cases. For instance, we can't know where the file will be closed, and that could matter if we're reading from a socket. We also need to be very careful not to force the reads too early: that often leads to a deadlock when reading from a pipe. Today, it is usually preferred to avoid lazy IO, and resort to some library like pipes or conduit for "streaming"-like operations, where there is no ambiguity.

Upvotes: 4

Why is sequence [getLine, getLine, getLine] not evaluated lazily?

Answers (2)

Related Questions