StrickBan
StrickBan

Reputation: 37

Implementing own Union function without going through lists twice

I have to write a Union function using recursion

The ouput has to be the Union (no duplicates) of two lists. My teacher said the implementation has to be recursive and we cannot go through the lists twice but I don't think I can come up with a way of solving the problem without going through the lists twice?

My ideas which would solve the problem (but involve going through lists twice): - Merge then remove duplicates - Sorting the lists, then merge

Any hints or help would be appreciated

Edit: Well so I got to combine both lists by doing this:

union1 :: (Eq a) => [a] -> [a] -> [a]
union1 xs [] = xs
union1 [] ys = ys
union1 (x:xs)(y:ys) = x:y:union1(xs)(ys)

Then I thought I could use nub or a similar function to remove the duplicates but I got stuck thinking because then I would be going through the lists twice, right?

Upvotes: 0

Views: 715

Answers (1)

MikaelF
MikaelF

Reputation: 3645

What is list union?

I would like to first point out that the requirements your teacher gave you are a bit vague. Moreover, union on multisets (aka sets that can have duplicates, like lists) have two different definitions in mathematics (other source). I am no mathematician, but here is what I was able to glean from various internets. Here is one definition:

λ> [1,2,2,3,3,3] `unionA` [1,2,2,2,3] --also called multiset sum
[1,1,2,2,2,2,2,3,3,3,3]

This is simply (++), if you're not worried about ordering. And here is the other:

λ> [1,2,2,3,3,3] `unionB` [1,2,2,2,3]
[1,2,2,2,3,3,3] --picks out the max number of occurrences from each list

Adding to this confusion, Data.List implements a somewhat quirky third type of union, that treats its left input differently from its right input. Here is approximately the documentation found in comments the source code of union from Data.List:

The union function returns the list union of the two lists. Duplicates, and elements of the first list, are removed from the the second list, but if the first list contains duplicates, so will the result. For example,

  λ> "dog" `union` "cow" 
  
  "dogcw"

Here, you have 3 possible meanings of "union of lists" to choose from. So unless you had example input and output, I don't know which one your teacher wants, but since the goal is probably for you to learn about recursion, read on...


Removing duplicates

Removing duplicates from unordered lists in Haskell can be done in linear time, but solutions involve either random-access data structures, like Arrays, or something called "productive stable unordered discriminators", like in the discrimination package. I don't think that's what your teacher is looking for, since the first is also common in imperative programming, and the second is too complex for a beginner Haskell course. What your teacher probably meant is that you should traverse each list explicitly only once.

So now for the actual code. Here I will show you how to implement union #3, starting from what you wrote, and still traverse the lists (explicitly) only once. You already have the basic recursion scheme, but you don't need to recurse on both lists, since we opted for option #3, and therefore return the first list unchanged.


Actual code

You'll see in the code below that the first list is used as an "accumulator": while recursing on the second list, we check each element for a duplicate in the first list, and if there isn't a duplicate, we append it to the first list.

union [] r = r
union l [] = l
unionR ls (r:rs)
  | r `elem` ls = unionR ls rs   --discard head of second list if already seen
                                 --`elem` traverses its second argument,
                                 --but see discussion above.
  | otherwise = unionR (r:ls) rs --append head of second list

As a side note, you can make this a bit more readable by using a fold:

union xs = foldl step xs where --xs is our first list; no recursion on it,
                               --we use it right away as the accumulator.
  step els x
    | x `elem` els = els
    | otherwise = x : els

Upvotes: 1

Related Questions