Reputation:
main = do
input <- sequence [getLine, getLine, getLine]
mapM_ print input
Let's see this program in action:
m@m-X555LJ:~$ runhaskell wtf.hs
asdf
jkl
powe
"asdf"
"jkl"
"powe"
Surprisingly to me, there seems to be no laziness here. Instead, all 3 getLine
s are evaluated eagerly, the read values are stored in memory and then, not before, all are printed.
Compare to this:
main = do
input <- fmap lines getContents
mapM_ print input
Let's see this in action:
m@m-X555LJ:~$ runhaskell wtf.hs
asdf
"asdf"
lkj
"lkj"
power
"power"
Totally different stuff. Lines are read one by one and printed one by one. Which is odd to me because I don't really see any differences between these two programs.
From LearnYouAHaskell:
When used with I/O actions,
sequenceA
is the same thing assequence
! It takes a list of I/O actions and returns an I/O action that will perform each of those actions and have as its result a list of the results of those I/O actions. That's because to turn an[IO a]
value into anIO [a]
value, to make an I/O action that yields a list of results when performed, all those I/O actions have to be sequenced so that they're then performed one after the other when evaluation is forced. You can't get the result of an I/O action without performing it.
I'm confused. I don't need to perform ALL IO actions to get the results of just one.
A few paragraphs earlier the book shows a definition of sequence
:
sequenceA :: (Applicative f) => [f a] -> f [a] sequenceA [] = pure [] sequenceA (x:xs) = (:) <$> x <*> sequenceA xs
Nice recursion; nothing here hints me that this recursion should not be lazy;just like in any other recursion, to get the head of the returned list Haskell doesn't have to go down through ALL steps of recursion!
Compare:
rec :: Int -> [Int]
rec n = n:(rec (n+1))
main = print (head (rec 5))
In action:
m@m-X555LJ:~$ runhaskell wtf.hs
5
m@m-X555LJ:~$
Clearly, the recursion here is performed lazily, not eagerly.
Then why is the recursion in the sequence [getLine, getLine, getLine]
example performed eagerly?
As to why it is important that IO actions are run in order regardless of the results: Imagine an action
createFile :: IO ()
andwriteToFile :: IO ()
. When I do asequence [createFile, writeToFile]
I'd hope that they're both done and in order, even though I don't care about their actual results (which are both the very boring value()
) at all!
I'm not sure how this applies to this Q.
Maybe I'll word my Q this way...
In my mind this:
do
input <- sequence [getLine, getLine, getLine]
mapM_ print input
should detoriate to something like this:
do
input <- do
input <- concat ( map (fmap (:[])) [getLine, getLine, getLine] )
return input
mapM_ print input
Which, in turn, should detoriate to something like this (pseudocode, sorry):
do
[ perform print on the result of getLine,
perform print on the result of getLine,
perform print on the result of getLine
] and discard the results of those prints since print was applied with mapM_ which discards the results unlike mapM
Upvotes: 9
Views: 190
Reputation: 15703
getContents
is lazy, getLine
isn't. Lazy IO isn't a feature of Haskell per se, it's a feature of some particular IO actions.
I'm confused. I don't need to perform ALL IO actions to get the results of just one.
Yes you do! That is one of the most important features of IO
, that if you write a >> b
or equivalently,
do a
b
then you can be sure that a
is definitely "run" before b
(see footnote). getContents
is actually the same, it "runs" before whatever comes after it... but the result it returns is a sneaky result that sneakily does more IO when you try to evaluate it. That is actually the surprising bit, and it can lead to some very interesting results in practice (like the file you're reading the contents of being deleted or changed while you're processing the results of getContents
), so in practical programs you probably shouldn't be using it, it mostly exists for convenience in programs where you don't care about such things (Code Golf, throwaway scripts or teaching for instance).
As to why it is important that IO actions are run in order regardless of the results: Imagine an action createFile :: IO ()
and writeToFile :: IO ()
. When I do a sequence [createFile, writeToFile]
I'd hope that they're both done and in order, even though I don't care about their actual results (which are both the very boring value ()
) at all!
Addressing the edit:
should detoriate to something like this:
do input <- do input <- concat ( map (fmap (:[])) [getLine, getLine, getLine] ) return input mapM_ print input
No, it actually turns into something like this:
do
input <- do
x <- getLine
y <- getLine
z <- getLine
return [x,y,z]
mapM_ print input
(the actual definition of sequence
is more or less this:
sequence [] = return []
sequence (a:as) = do
x <- a
fmap (x:) $ sequence as
Upvotes: 6
Reputation: 116174
Technically, in
sequenceA (x:xs) = (:) <$> x <*> sequenceA xs
we find <*>
, which first runs the action on the left, then the action on the right, and finally applies their result together. This is what makes the first effect in the list to be occur first, and so on.
Indeed, on monads, f <*> x
is equivalent to
do theF <- f
theX <- x
return (theF theX)
More in general, note that all the IO actions are generally executed in order, first to last (see below for a few rare exceptions). Doing IO in a completely lazy way would be a nightmare for the programmer. For instance, consider:
do let aX = print "x" >> return 4
aY = print "y" >> return 10
x <- aX
y <- aY
print (x+y)
Haskell guarantees that the output is x y 14
, in that order. If we had completely lazy IO we could also get y x 14
, depending on which argument is forced first by +
. In such case, we would need to know exactly the order in which the lazy thunks are demanded by every operation, which is something the programmer definitely does not want to care about. Under such detailed semantics, x + y
is no longer equivalent to y + x
, breaking equational reasoning in many cases.
Now, if we wanted to force IO to be lazy we could use one of the forbidden functions, e.g.
do let aX = unsafeInterleaveIO (print "x" >> return 4)
aY = unsafeInterleaveIO (print "y" >> return 10)
x <- aX
y <- aY
print (x+y)
The above code makes aX
and aY
lazy IO actions, and the order of the output is now at the whim of the compiler and the library implementation of +
. This is in general dangerous, hence the unsafe
ness of lazy IO.
Now, about the exceptions. Some IO actions which only read from the environment, like getContents
were implemented with lazy IO (unsafeInterleaveIO
). The designers felt that for such reads, lazy IO can be acceptable, and that the precise timing of the reads is not that important in many cases.
Nowadays, this is controversial. While it can be convenient, lazy IO can be too unpredictable in many cases. For instance, we can't know where the file will be closed, and that could matter if we're reading from a socket. We also need to be very careful not to force the reads too early: that often leads to a deadlock when reading from a pipe. Today, it is usually preferred to avoid lazy IO, and resort to some library like pipes
or conduit
for "streaming"-like operations, where there is no ambiguity.
Upvotes: 4