Haskell Implement left outer join

Question

I am working on a uni assignment to implement a left outer join in Haskell using only the prelude function definitions. I have tried using list comprehensions:

ljoin :: Eq a => [(a,b)] -> [(a,c)] -> [(a,b, Maybe c)]
ljoin xs ys = [if x1 == y1 then (x1,x2, Just y2) else (x1,x2, Nothing)| (x1, x2) <- xs, (y1,y2) <- ys]

This produces the following output for the example:

Main> ljoin [(2,"S"),(1,"J")] [(2,1),(2,2),(3,4)]
[(2,"S",Just 1),(2,"S",Just 2),(2,"S",Nothing),(1,"J",Nothing),(1,"J",Nothing),(1,"J",Nothing)]
(399 reductions, 834 cells)

Is anyone able to give me any tips to get rid of the duplicate entries in the result? The (2, "S", Nothing) should not be in the results and only 1 of the (1, "J", Nothing) term should be present.

Thanks in advance.

CR Drost · Accepted Answer

So when you do [f x y | x <- xs, y <- ys] you are doing a Cartesian product or inner join: every element from xs will be paired against every value of ys. That's not what you want.

What you want is to first go over all of the elements of xs, then get a list of all of the values in ys which match, then dispatch over whether that latter list is null, to include a Nothing if you need it, then adjoin x1 and x2 to the elements of that list to make triples. So we start with writing that "general strategy":

ljoin xs ys = concat [map (adjoin x1 x2) $ handleNulls $ getMatches x1 ys 
                          | (x1, x2) <- xs] where

The "concat" is needed because each element of xs will be producing its own list, and we want to "flatten" them all into one big list of results, that's the only thing that was missing from our general strategy. Now we fill in the details:

ljoin xs ys = concat [map (adjoin x1 x2) $ handleNulls $ getMatches x1 ys 
                          | (x1, x2) <- xs] where
    adjoin x1 x2 x3 = (x1, x2, x3)
    handleNulls zs | null zs   = [Nothing]
                   | otherwise = map Just zs
    getMatches x1 ys = [y2 | (y1, y2) <- ys, y1 == x1]

Now there are two ways that you might mean "using only the prelude function definitions": this could mean that you're not allowed any import statements but you're allowed to make the expression as complex as you like (sane!), or it could mean that you're not allowed any other definitions at all, including where (insane!). I don't see any way to do the latter without cheating for the handleNulls step, but you can sort of "hide" the cheating as a function application because it's not a recursive binding:

ljoin xs ys = concat [ zip3 (repeat x1) (repeat x2) . 
                  (\zs -> if null zs then [Nothing] else map Just zs) .
                  map snd . filter ((x1 ==) . fst) $ ys | (x1, x2) <- xs]

But that's just "yuck"; the previous version is much clearer.

Haskell Implement left outer join

Answers (2)

Related Questions