Reputation: 5410
I am new to both Haskell and programming. My question about binding in pattern-matched, recursive functions. For instance, suppose I have a function which checks whether a given list (x:xs) is a sublist of another list, (y:ys). My initial thought, following the examples in my textbook, was:
sublist [] ys = True
sublist xs [] = False
sublist (x:xs) (y:ys)
| x == y = sublist xs ys
| x /= y = sublist (x:xs) ys
This works on test data, e.g.,
sublist [1, 2, 3] [1, 2, 4, 1, 2, 3]
where I expected it to fail. I expect it to fail, since
sublist [1, 2, 3] [1, 2, 4, 1, 2, 3]
= sublist [2, 3] [2, 4, 1, 2, 3]
= sublist [3] [4, 1, 2, 3]
at which point, I thought, [3] = 3:[] will be matched with (x:xs) in sublist, and [4, 1, 2, 3] will be matched with (y:ys) in sublist. How, then, is sublist working?
Edit: Thanks to everyone here, I think I've solved my problem. As noted, I was ("subconsciously") wanting sublist to backtrack for me. Using the last answer (BMeph) posted as a guide, I decided to approach the problem differently, in order to solve the "binding problem," i.e., the "backtracking" problem.
subseq :: (Eq a) => [a] -> [a] -> Bool
subseq [] _ = True
subseq _ [] = False
subseq (x:xs) (y:ys) =
-- subseq' decides whether the list bound to (x:xs) = M is a prefix of the list
-- bound to L = (y:ys); it recurses through L and returns a Bool value. subseq
-- recurses through M and L, returning a disjunction of Bool
-- values. Each recursive call to subseq passes M and ys to subseq', which
-- decides whether M is a prefix of the **current list bound to ys**.
let subseq' :: (Eq a) => [a] -> [a] -> Bool
subseq' [] _ = True
subseq' _ [] = False
subseq' (x:xs) (y:ys) = (x == y) && subseq' xs ys
in subseq' (x:xs) (y:ys) || subseq (x:xs) ys
Upvotes: 8
Views: 3101
Reputation: 1462
I think where you may be misunderstanding, is that (when you first wrote the function) you assumed that in your check x /= y = sublist (x:xs) ys
you (sub-consciously?) assumed that the function would backtrack and re-do your function with the tail of the original second list, not the tail of whichever piece of the list you're using when it doesn't match.
One nice (and unsettling) thing about Haskell is just how ridiculously powerful it is.
For an example, here's how what you wanted should have looked:
sublist [] ys = True
sublist xs [] = False
sublist (x:xs) (y:ys) | x /= y = False
| otherwise = sublist xs ys || sublist (x:xs) ys
which will check for all of the pieces of the first list. The "official" definition for the function (look up "isInfixOf" in your documentation) has a few extra functions that basically means the same thing.
Here's another way write it, that looks more "explanatory" to my eyes:
sublist [] ys = True
sublist xs [] = False
sublist xs ys = (xs == take (length xs) ys) || sublist xs (tail ys)
Upvotes: 2
Reputation: 139501
Debug.Trace
is your friend. With sublist
instrumented as
sublist [] ys = trace ("A: [] " ++ show ys) True
sublist xs [] = trace ("B: " ++ (show xs) ++ " []") False
sublist (x:xs) (y:ys)
| x == y = trace (info "C" "==")
sublist xs ys
| x /= y = trace (info "D" "/=")
sublist (x:xs) ys
where info tag op =
tag ++ ": " ++ (show x) ++ " " ++ op ++ " " ++ (show y) ++
"; xs=" ++ (show xs) ++ ", ys=" ++ show ys
you see what's happening, namely that it's repeatedly throwing away the head of the second list until it finds a match:
*Main> sublist [1, 2, 3] [1, 2, 4, 1, 2, 3] C: 1 == 1; xs=[2,3], ys=[2,4,1,2,3] C: 2 == 2; xs=[3], ys=[4,1,2,3] D: 3 /= 4; xs=[], ys=[1,2,3] D: 3 /= 1; xs=[], ys=[2,3] D: 3 /= 2; xs=[], ys=[3] C: 3 == 3; xs=[], ys=[] A: [] [] True
Another tool that will help you implement sublist
correctly is Test.QuickCheck
, a library that automatically creates test data for use in verifying properties that you specify.
For example, say you want sublist
to treat xs
and ys
as sets and determine whether the former is a subset of the latter. We can use Data.Set
to specify this property:
prop_subsetOf xs ys =
sublist xs ys == fromList xs `isSubsetOf` fromList ys
where types = (xs :: [Int], ys :: [Int])
This says sublist
should be equivalent to the definition on the right. The prop_
prefix is a popular convention for naming test properties to be used with QuickCheck.
Running it also identifies a failure case:
*Main> quickCheck prop_subsetOf *** Failed! Falsifiable (after 6 tests): [0,0] [0]
Upvotes: 3
Reputation: 41511
YuppieNetworking and sdcwc have already explained how the matching works. So your sublist
searches for a sublist in the same sense as subsequence (not necessarily the same items in a row, there can be anything in between).
I want to note that you can often avoid explicit recursion to remove unnecessary items from the front of the list with dropWhile
. Also, I wanted to give an example how to check if two lists have the same prefixes (you need this to check if the second list contains items of the first one in a row).
The first example is similar to your function, it allows for items in between, but it uses dropWhile
to remove items in front of ys
:
-- Test:
-- map ("foo" `subListOf`) ["kungfoo", "f-o-o!", "bar"] == [True,True,False]
[] `subListOf` _ = True
(x:xs) `subListOf` ys =
let ys' = dropWhile (/= x) ys -- find the first x in ys
in case ys' of
(_:rest) -> xs `subListOf` rest
[] -> False
The second example looks for a "dense" sublist:
-- Test:
-- map ("foo" `denseSubListOf`) ["kungfoo!", "-f-o-o-"] == [True,False]
[] `denseSubListOf` _ = True
_ `denseSubListOf` [] = False
xs `denseSubListOf` ys =
let ps = zip xs ys in
(length ps == length xs && all (uncurry (==)) ps) -- same prefix of xs and ys
|| xs `denseSubListOf` (tail ys) -- or search further
Please note that to check that the second list contains all the first one in the beginning I compare elements pair by pair (I zip them together to do it).
It is easier to explain by example:
zip [1,2,3] [1,2,3,4,5] -- gives [(1,1), (2,2), (3,3)], 4 and 5 are truncated
uncurry (==) -- an equivalent of (\(a,b) -> a == b)
all -- gives True iff (uncurry (==)) is True for all pairs
length ps == length xs -- this is to ensue that the right list is long enough
Upvotes: 1
Reputation: 25654
sublist [1, 2, 3] [1, 2, 4, 1, 2, 3]
= sublist [2, 3] [2, 4, 1, 2, 3]
= sublist [3] [4, 1, 2, 3]
= sublist [3] [4, 1, 2, 3]
= sublist (3:[]) (4:[1,2,3]) -- Since 3 /= 4, we take sublist (x:xs) ys
= sublist (3:[]) [1,2,3]
= sublist (3:[]) (1:[2,3])
= sublist (3:[]) [2,3]
= sublist (3:[]) (2:[3])
= sublist (3:[]) [3]
= sublist [] []
= True
sublist checks if heads of lists are equal. If yes, then it removes them and proceeds (sublist xs ys
). If no, it removes head from the second list (sublist (x:xs) ys
). This way it "finds" the following association:
1 2 3
| | |
| | \-----\
| | |
1 2 4 1 2 3
In other words, to check sublist [1,2,3] ys
for some list ys
it pops elements from ys
as long as they are not 1. Then it pops elements as long as they are not 2. Then it pops elements as long as they are not 3. If [1,2,3]
is exhausted, then it reports True; if ys
is exhausted, it reports False.
Upvotes: 8
Reputation: 8851
It is working because:
[3]
is matched as x:xs
as 3:[]
, [4, 1, 2, 3]
is matched as y:ys
as 4:[1,2,3]
3/=4
so sublist (x:xs) ys
is evaluated, which eventually is Truetrace:
sublist [1, 2, 3] [1, 2, 4, 1, 2, 3]
= sublist [2, 3] [2, 4, 1, 2, 3]
= sublist [3] [4, 1, 2, 3]
= sublist [3] [1, 2, 3]
= sublist [3] [2, 3]
= sublist [3] [3]
= sublist [] [] = True
Upvotes: 11