m0meni
m0meni

Reputation: 16455

In Haskell what are the inner workings of list comprehension?

I'm rather new to Haskell and was looking at this post here: Cartesian product of 2 lists in Haskell.

In the answer there is this snippet of code:

cartProd xs ys = [(x,y) | x <- xs, y <- ys]

Which with these two lists:

xs = [1,2,3]
ys = [4,5,6]

would yield

[(1,4),(1,5),(1,6),(2,4),(2,5),(2,6),(3,4),(3,5),(3,6)]

Had I not seen this result I would have assumed it would just have returned

[(1,4),(2,5),(3,6)]

because it would traverse both lists at the same time.

But now it - to programming languages I know better - looks like a double for loop used to traverse a matrix:

for (int x = 1; x < 4; x++)
    for(int y = 4; y < 7; y++)
        //make tuple (x,y)

What causes a list comprehension to behave in this manner?

Upvotes: 3

Views: 519

Answers (3)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477794

This introduction explains the syntax of list comprehension. Basically one can say that every x <- list means an additional nested "for"-loop to generate tuples and every predicate is simply tested. Thus the expression:

[(x,y) | x <- xs, even x, y <- ys, even 3*y-div x 2]

Would be translated in an imperative language as:

for (var x : xs) {
    if(even(x)) {
    for(var y : ys) {
        if(even(3*y-x/2)) {
            yield (x,y)
        }
    }
}

yield is a keyword that is sometimes used with co-routines. Furthermore as for yield the evaluation is done lazily. This enables for instance to generate all even integers as:

[x|x <-[2..],even x]

List monads

In order to understand list comprehension fundamentally, one needs to know what Monads are. Every list comprehension can be translated into a list monad. For instance your example is translated into:

do x <- xs
   (do y <- ys
       return (x,y))

Which is again syntactical sugar for:

xs >>= (\x -> (ys >>= \y -> return (x,y)))

A monad is an important concept in functional programming (and probably one better reads the wikipedia page), because it is a bit hard to master. The sometimes say a monads are like burritos,....

Once you more or less understand a monad: a monad is a type-class with a return statement and a >>= channeling statement. Now the return statement for the inner part is easy:

return x = [x]

So that means that each time x and y are set, you will create a tuple (x,y) and return it as a singleton list: thus [(x,y)]. Now the "bind" operator >>= needs to "glue" ys and \y -> return (x,y) together. This is done by implementing it as:

(>>=) xs f = concat $ map f xs

In other words you do a mapping and concatenate the result of the mapping.

Now if take the second part of the unsugared expression into account:

ys >>= \y -> return (x,y)

It means that for a given x (we now abstract away), we will map every element in ys to a tuple (x,y) and return it. We will thus generate a list of lists with every list being a singleton containing a tuple. Something like (if ys=[1,2]):

[[(x,1)],[(x,2)]]

Now the >>= will furthermore concat it into:

\x -> [(x,1),(x,2)]

Until now, we've abstracted the x away (assumed it was one). But now we can take the first part of that expression:

xs >>= \x -> [(x,1),(x,2)]

If xs=[3,5], it means we will create again lists:

[[(3,1),(3,2)],[(5,1),(5,2)]]

and after the concat:

[(3,1),(3,2),(5,1),(5,2)]

Which is what we expect for:

[(x,y)|x<-[3,5],y<-[1,2]]

Upvotes: 9

chi
chi

Reputation: 116174

Quoting from the Haskell Report, list comprehensions are evaluated as follows:

[ e | True ]   = [e]
[ e | q ]      = [ e | q, True ]
[ e | b,  Q  ] = if b then [ e | Q ] else []
[ e | p <- l,  Q ] = let ok p = [ e | Q ]
                         ok _ = []
                     in concatMap ok  l
[ e | let decls,  Q ] = let decls in [ e | Q ]

In your case, the relevant part is, since the pattern p is just a variable x:

[ e | x <- l, Q ] = concatMap (\x -> [ e | Q ]) l

More concretely, the comprehension [(x,y) | x <- xs, y <- ys] is translated to

concatMap (\x -> [(x,y) | y <- ys]) xs

Which is, by definition of concatMap

concat (map (\x -> [(x,y) | y <- ys]) xs)

Let's substitute concrete values for xs,ys:

concat (map (\x -> [(x,y) | y <- [4,5,6]]) [1,2,3])

Applying map:

concat [ [(1,y) | y <- [4,5,6]] 
       , [(2,y) | y <- [4,5,6]] 
       , [(3,y) | y <- [4,5,6]] ]

Evaluating the inner list comprehensions: (these could be translated again using the laws above, but I'll make it short)

concat [ [(1,4),(1,5),(1,6)]
       , [(2,4),(2,5),(2,6)]
       , [(3,4),(3,5),(3,6)] ]

And by concatenating the above lists we obtain the result,

       [  (1,4),(1,5),(1,6) 
       ,  (2,4),(2,5),(2,6) 
       ,  (3,4),(3,5),(3,6)  ]

Note that GHC also implements as a Haskell extension the so-called parallel list comprehensions, which do operate as you expected:

> :set -XParallelListComp
> [(x,y)| x<-[1,2,3] | y <-[4,5,6]]
[(1,4),(2,5),(3,6)]

Internally, they use the zip (or rather, zipWith) function.

Upvotes: 8

Ben
Ben

Reputation: 71590

The ideas behind list comprehension syntax comes from set-builder notation in mathematics.

In mathematics, one can write something like:

{ (x, y) | x ∈ xs, y ∈ ys }

To mean "the set of all elements of the form (x, y), where x is an element of the set xs and y is an element of the set ys".

Now we want to take this set-builder idea and turn it into a "list-builder" syntax for programming languages. So we want this:

[ (x, y) | x <- xs, y <- ys ]

To mean "the list containing all elements of the form (x, y), where x was an element obtained from xs and y is an element obtained from ys".

Obviously if this list notation produced the list [(1,4),(2,5),(3,6)] (when xs = [1, 2, 3] and ys = [4, 5, 6]) it simply wouldn't be all the pairs of the required form: (1, 6) is an element that can be obtained from xs paired with an element that can be obtained from ys, but it's not in your list.

So the really trite answer to "What causes a list comprehension to behave in this manner?" is that it was programmed to behave in that manner, because that's how the people who came up with it wanted it to behave.

There are many different ways you can program this behaviour. You can (as the OP noticed) use nested loops, or you can use functions like map and concat, or you can use the Monad instance for lists, etc. As such, you can translate list comprehension syntax into any of these in order to demonstrate that list comprehensions are "really monadic" or "really nested loops" or whatever.

But fundamentally, list comprehensions are combinatorial rather than zipping because the language designers deliberately chose the syntax to mean that, in order for it to be a bit like set-builder notation in mathematics1. That "x remains constant while y increments" isn't just an interesting quirk of the implementation, the implementation was chosen specifically to do that. If there were some alternate universe where the list monad (or nested loops, etc) wasn't useful for this purpose, list comprehensions would not produce different results in that universe, they would be implemented some other way to produce the same results.


1 There are significant differences between list comprehension syntax and set-builder notation, even though they look obviously similar. A fundamental difference comes from using lists rather than sets; an element is simply either in a set or not, but lists contain elements at specific indices (including the possibility of containing a given element at multiple indices).

So list comprehension syntax has to define the order it will produce its elements (and exactly how many elements it will produce), and that makes list comprehensions fundamentally about enumeration, whereas set-builder notation is fundamentally about membership. The set-builder notation { (x, y) | x ∈ xs, y ∈ ys } is really saying "you can answer whether a given (x, y) is in the set we're building by checking whether x ∈ xs and y ∈ ys", while the list comprehension [ (x, y) | x <- xs, y <- ys ] is saying "you can enumerate the elements of the list we're building by enumerating xs and then for every element also enumerate ys"

Upvotes: 1

Related Questions