Reputation: 1241
How do I go about writing a SQL-like full join operation on lists with the below signature?
fullJoin :: [a] -> [b] -> (a -> b -> Bool) -> [(Maybe a, Maybe b)]
fullJoin xs [] _ = map (\x -> (Just x, Nothing)) xs
fullJoin [] ys _ = map (\y -> (Nothing, Just y)) ys
fullJoin xs@(xh:xt) ys@(yh:yt) on =
if on xh yh
then (Just xh, Just yh) : fullJoin xt yt
else ???
The condition as written is provided by the a -> b -> Bool
. here in the example it's set as (\n i -> length n == i)
, meaning the records from names
which have the same length as the numbers in nums
.
Example:
names = ["Foo","Bar", "John" , "Emily", "Connor"]
nums = [1,2,3,4,5]
fullJoin names nums (\n i -> length n == i)
== [ (Just "Foo", Just 3)
, (Just "Bar", Just 3)
, (Just "John", Just 4)
, (Just "Emily", Just 5)
, (Just "Connor", Nothing)
, (Nothing, Just 1)
, (Nothing, Just 2)
]
To clarify the exact SQL equivalent of the said function, here's how it would be written in PostgreSQL:
create table names as
select * from (values
('Foo'),
('Bar'),
('John'),
('Emily'),
('Connor')
)
as z(name);
create table nums as
select * from (values
(1),
(2),
(3),
(4),
(5)
)
as z(num);
select * from names
full join nums
on char_length(name) = num
And running this will yield:
name num
(null) 1
(null) 2
Foo 3
Bar 3
John 4
Emily 5
Connor (null)
Fiddle link : sqlfiddle
Upvotes: 3
Views: 287
Reputation: 71065
Just dully implement what it is you want it to do. Super inefficient:
fullJoin :: [a] -> [b] -> (a -> b -> Bool)
-> [(Maybe a, Maybe b)]
fullJoin xs ys p = concatMap (map (Just *** Just) . snd) a
++ [(Just x, Nothing) | (x, []) <- a]
++ [(Nothing, Just y) | (y, []) <- b]
where
a = [(x, [(x, y) | y <- ys, p x y]) | x <- xs]
b = [(y, [(x, y) | x <- xs, p x y]) | y <- ys]
If we were allowed to add the Eq
constraints on the a
and b
types, we could use Data.List.\\
to find the unmatched elements instead of making the second sweep.
Testing:
> mapM_ print $ fullJoin names nums (\n i -> length n == i)
(Just "Foo",Just 3)
(Just "Bar",Just 3)
(Just "John",Just 4)
(Just "Emily",Just 5)
(Just "Connor",Nothing)
(Nothing,Just 1)
(Nothing,Just 2)
Upvotes: 1
Reputation: 46409
Here is a naive Python implementation. It is O(n^2)
.
#! /usr/bin/env python
def full_join_cond (list1, list2, condition_fn):
answer = []
for x in list1:
matches = []
for y in list2:
if condition_fn(x, y):
matches.append(y)
if len(matches):
answer.extend([[x, y] for y in matches])
else:
answer.append([x, None])
for y in list2:
matched = False
for x in list1:
if condition_fn(x, y):
matched = True
break
if not matched:
answer.append([None, y])
return answer
names = ["Foo","Bar", "John" , "Emily", "Connor"]
nums = [1,2,3,4,5]
cond = lambda x, y: len(x) == y
print(full_join_cond(names, nums, cond))
And here is an implementation that more closely matches how a database would perform this join. It is O(size_of_output)
which is frequently O(n)
.
def list_map_to_dict (l, m):
d = {}
for x in l:
mx = m(x)
if mx in d:
d[mx].append(x)
else:
d[mx] = [x]
return d
def full_join_map (list1, map1, list2, map2):
dict1 = list_map_to_dict(list1, map1)
dict2 = list_map_to_dict(list2, map2)
answer = []
for mx, xs in dict1.iteritems():
if mx in dict2:
for y in dict2[mx]:
answer.extend([[x, y] for x in xs])
dict2.pop(mx)
else:
answer.extend([[x, None] for x in xs])
for my, ys in dict2.iteritems():
answer.extend([[None, y] for y in ys])
return answer
print(full_join_map(names, lambda x: len(x), nums, lambda y: y))
And for giggles and grins the two can be combined in a function that both is general and can run fast. (I have not tried to make the function signature reasonable - just trying to show the idea.)
def full_join (list1, map1, list2, map2, cond=None):
if map1 is None:
map1 = lambda x: None
if map2 is None:
map2 = lambda y: None
if cond is None:
cond = lambda x, y: True
dict1 = list_map_to_dict(list1, map1)
dict2 = list_map_to_dict(list2, map2)
answer = []
for mx, xs in dict1.iteritems():
if mx in dict2:
answer.extend(full_join_cond(xs, dict2[mx], cond))
dict2.pop(mx)
else:
answer.extend([[x, None] for x in xs])
for my, ys in dict2.iteritems():
answer.extend([[None, y] for y in ys])
return answer
Upvotes: 1
Reputation: 780
A full outer join between tables X
and Y
on a condition rel(x, y)
can be thought of as the union of three (disjoint) parts:
(x, y)
where rel(x,y)
(x0, null)
where for all y in Y
, not (rel(x0, y))
(null, y0)
where for all x in X
, not (rel(x, y0))
We can structure our Haskell program in a similar way:
fullJoin :: [a] -> [b] -> (a -> b -> Bool) -> [(Maybe a, Maybe b)]
fullJoin xs ys rel = concat $
[[(Just x, Just y) | y <- ys, x `rel` y] | x <- xs] -- for each x, find all the related ys
<> [[(Just x, Nothing)] | x <- xs, all (\y -> not (x `rel` y)) ys] -- find all xs that have no related ys
<> [[(Nothing, Just y)] | y <- ys, all (\x -> not (x `rel` y)) xs] -- find all ys that have no related xs
The way the problem is posed, you can't get this faster than O(n^2)
. That said, my solution, while being O(n^2)
, it not optimal: it traverses xs
twice. You can think about what you'd have to do in order to only traverse xs
once. It has to do with finding a way to keep track of which xs
you've already seen.
Upvotes: 2