Reputation:
I would like to do set difference between 2 integer lists, which allow repetitions in haskell.
So in case of having [1,2,1,4,3] [1,2,4]
, the difference would be [1,3]
Currently I can do it via the normal \\
operator listA \\ listB
.
But the problem is this too slow. As integers are in ord group it can be done faster.
I am aware of the Data.Multiset
module, which would do it faster, but is there a native way to do it on lists without the Data.Multiset
module?
Upvotes: 4
Views: 724
Reputation:
As I could not use Data.MultiSet
I needed to imeplement my own solution via lists, so I will post the solution.
In order to get the set difference of two lists with repetitons, we first need to group the numbers together, to get the amount of repetitions of each number:
tupleGroup [] _ = []
tupleGroup (hd:tl) (x, y)
| hd == x = (x, succ y) : tupleGroup tl (x, succ y)
| otherwise = (hd, 1) : tupleGroup tl (hd, 1)
group' [] = []
group' (hd:tl) = tupleGroup tl (hd, 1)
In order to call the helper function group' we first need to sort the list, as tupleGroup needs a sorted list.
Next function is the difference function, in which we supply the grouped lists:
difference [] [] = []
difference (hd:tl) [] = fst hd : difference tl []
difference [] (hd:tl) = []
difference (hd1:tl1) (hd2:tl2)
| x1 == x2 && y1 > y2 = x1 : difference tl1 tl2
| x1 == x2 = difference tl1 tl2
| x1 < x2 = x1 : difference tl1 (hd2:tl2)
| otherwise = difference (hd1:tl1) tl2
where
x1 = fst hd1
x2 = fst hd2
y1 = snd hd1
y2 = snd hd2
The function will return a list of all numbers, which the repetition count is bigger in the first list, than in the second one.
Upvotes: 1
Reputation: 39390
As integers are in ord group it can be done faster.
Yes, but it requires building a sorted index. And that's pretty much what Data.Multiset
is doing. You could of course write an operation that does that manually, but you'd be effectively reimplementing Multiset
by then.
Upvotes: 6