cambraca
cambraca

Reputation: 27835

Removing elements from [[Int]] list if the numbers appeared before in the same list in Haskell

I have something like this:

[[1,3,2],[5,6],[3,8],[10,11],[13,6]]

And I want to filter out the third and last list, so that it becomes:

[[1,3,2],[5,6],[10,11]]

The logic is to remove the lists if any of the numbers "appeared before" (in the case of [3,8], the number 3 appears in the first list).

I can only think of non-FP methods to do this. My goal is to learn Haskell in a "good way". How can this be done?

Upvotes: 1

Views: 72

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476709

What about this: you use an accumulator like a Data.HashSet and then use some kind of filter approach like:

import Data.HashSet
import Data.Hashable

somefilter :: (Hashable a, Ord a) => [[a]] -> [[a]]
somefilter = sf empty

sf :: (Hashable a, Ord a) => Set a -> [[a]] -> [[a]]
sf hs (x:xs) | any (flip member hs) x = tl
             | otherwise = x : tl
             where tl = sf (foldr insert hs x) xs
sf _ [] = []

Then the demo:

*Main> somefilter [[1,3,2],[5,6],[3,8],[10,11],[13,6]]
[[1,3,2],[5,6],[10,11]]

You can of course use another datastructure (list, treeset, whatever) to store the values you've encountered thus far. A HashSet is simply a reasonable choice because of the low lookup and insertion time complexity.

Edit

Evidently time complexity is not easy to measure for lazy programming. If you intend to generate the entire list, the time complexities can be derived from the documentation of Data.IntMap.

Given that the total number of elements is N (not to be confused with the number of lists), and we consider the number of bits of the Int W fixed (32 or 64), both insert and lookup are done in constant time. This means that the total time complexity is O(N).

For the nubBy solution, although the time complexity for intersect this probably runs in O(m n) with m the number of items in one list, and n the number of elements in another list. This is done for each list against all previous. So in case there are O(k) lists, the time complexity is O(k2 a2) with a the average number of items in a list. Since a k=N, the time complexity is thus approximately O(N2).

Edit 2

Based on your comment, you can rewrite the function in order to meet the new specifications:

import Data.HashSet
import Data.Hashable

somefilter2 :: (Hashable a, Ord a) => [[a]] -> [[a]]
somefilter2 = sf2 empty

sf2 :: (Hashable a, Ord a) => Set a -> [[a]] -> [[a]]
sf2 hs (x:xs) | any (flip member hs) x = sf2 hs xs
              | otherwise = x : sf2 (foldr insert hs x) xs
sf2 _ [] = []

With the difference between somefilter and somefilter2:

*Main> somefilter [[1,3,2],[5,6],[3,8],[10,11],[8],[13,6]]
[[1,3,2],[5,6],[10,11]]
*Main> somefilter2 [[1,3,2],[5,6],[3,8],[10,11],[8],[13,6]]
[[1,3,2],[5,6],[10,11],[8]]

there is only a difference in the recursive part of sf: now you only add elements to the Set in case you "emit" an element.

Upvotes: 5

Related Questions