Ganymede
Ganymede

Reputation: 3665

Why does this list comprehension fail?

I'm trying to select unique elements from a list like this:

x = [1,1,2,3,4]
s = [e | e <- x, not (e `elem` s)]

It doesn't produce errors, but when I try to read from s it seems like the program hangs. Why?

Plus, what's the right way to do this?

Upvotes: 3

Views: 492

Answers (3)

J. Abrahamson
J. Abrahamson

Reputation: 74394

Note that while Russel's Paradox helps to suggest that this might be non-computable, it still fails even if you change it to s = [e | e <- x, elem e s].

Here's an instructive manual expansion. For any non-empty list, x

s = [e | e <- x, not (e `elem` s)]

simplifies to

s = do e <- x
       guard (not (e `elem` s))
       return e

s = x >>= \e -> if (not (e `elem` s)) then return e else mzero

s = concatMap (\e -> if (not (e `elem` s)) then [e] else []) x

s = foldr ((++) . (\e -> if (not (e `elem` s)) then [e] else [])) [] x

s = foldr (\e xs -> if (not (e `elem` s)) then (e:xs) else xs) [] x

s = foldr (\e ys -> if (e `elem` s) then ys else (e:ys)) [] x

which we can then begin evaluating. Since x was non-empty we can replace it with x:xs and inline a foldr

let f = (\e ys -> if (e `elem` s) then ys else (e:ys))

s = f x (foldr f [] xs)

s = (\ys -> if (x `elem` s) then ys else (x:ys)) (foldr f [] xs)

s = (\ys -> if (x `elem` f x (foldr f [] xs)) then ys else (x:ys)) (foldr f [] xs)

which is where we have our infinite loop—in order to evaluate f x (foldr f [] xs) we must evaluate f x (foldr f [] xs). You might say that the definition of s is not "productive enough" to kickstart its self-recursion. Compare this to the trick fibs definition

fibs = 1:1:zipWith (+) fibs (tail fibs)

which is kick-started with 1:1:... in order to be "productive enough". In the case of s, however, there's no (simple) way to be productive enough (see Will Ness' comment below for a fiendish workaround).


If we don't have the not there, it just switches the order of the branches on the if, which we never reach anyway.

Upvotes: 8

Joshua Taylor
Joshua Taylor

Reputation: 85913

I'm not much of a Haskell-er, but this seems like you've just coded up something sort of like1 Russell's paradox. Aren't you asking for a list s whose elements are those that are in x, but not in s?

s = [e | e <- [1,1,2,3,4], not (e `elem` s)]

So, consider what happens when you try to ask for the first element of s. Well, the first element from e is 1, so 1 will be the first element of s if not (1 `elem` s). Well, is (1 `elem` s)? We can check by iterating over the elements of s and seeing if 1 appeared. Well, let's start with the first element of s

In general suppose that some n is an element of s. Then what must be true? n must be an element of x (easy to check), and also not an element of s. But we supposed that it was an element of s. This is a contradiction. Therefore, no n can be an element of s, so s must be the empty list. Unfortunately, the Haskell compiler isn't doing the proof that we just did, it's trying to programmatically compute the elements of s.

To remove duplicate items from a list, you want the function that Neil Brown recommended in a comment, nub from Data.List:

nub::Eqa => [a] -> [a] Source

O(n^2). The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub means ‘essence’.) It is a special case of nubBy, which allows the programmer to supply their own equality test.


  1. It's not actually Russell's paradox; Russell's paradox is about a set that contains only those sets that don't contain themselves. That set can't exist, because if it contains itself, then it must not contain itself, and if it does not contain itself, then it must contain itself.

Upvotes: 17

Sassa NF
Sassa NF

Reputation: 5406

s = [e | (e:t) <- tails x, not (e `elem` t)]

The above is not meant to be the most efficient solution, but demonstrating how you could reason about the solution: in order to include the element of x only once, we need to make sure it is the last such element in x. This means we can search for occurrence of the element in the tail of the list. Data.List.tails produces all sublists of the list, so we can include the head of a sublist, if it doesn't appear in the remainder of the sublist - this is the condition that the head of the sublist is the last such element in the original list.


Referencing the value you are defining can cause unterminating computation, if the function using the value is strict (eager). The function is strict, if it always needs the complete value of the argument in order to produce a result.

For example, length is strict in the number of elements of the list - but not necessarily the actual elements of the list. So length [[i..] | i <- [1..10]] terminates without computing the values of the elements in the list (the infinite lists. Yet, length [[i..] | i <- [1..]] does not terminate, because in order to return a result, it needs to compute existence of all elements, which can never end for a open range.

However,

gtlength :: Int -> [a] -> Ordering
gtlength n [] = n `compare` 0
gtlength 0 xs = GT
gtlength n xs = gtlength (n-1) $ tail xs

can terminate even for infinite lists, because it doesn't need to evaluate the entire list.

Your function hangs because elem is strict. In order to test for non-existence of a element, it needs to evaluate the entire list, which is not available.

Upvotes: 1

Related Questions