Unique elements from a list according to a subset of fields

Question

Given a record like

data Foo = Foo { fooName :: Text, fooAge :: Int, fooCity :: Text }

With a list of such elements, is there a function to remove duplicates on a subset of fields only, on the model of this hypothetical removeDupBy function?

foos =
  [
    Foo "john" 32 "London",
    Foo "joe" 18 "New York",
    Foo "john" 22 "Paris",
    Foo "john" 32 "Madrid",
    Foo "joe" 17 "Los Angeles",
    Foo "joe" 18 "Berlin"
  ]

> removeDupBy (\(Foo f) -> (fooName, fooAge)) foos 
[
    Foo "john" 32 "London",
    Foo "joe" 18 "New York",
    Foo "john" 22 "Paris",
    Foo "joe" 17 "Los Angeles"
]

I could implement my own but would prefer using one from a well-established library, which will probably be much more performant and be much more resilient against edge cases. I was thinking of using nub but I'm not sure how to map the actual Foo elements to the tuples (fooName, fooAge) that nub would filter out.

castletheperson · Accepted Answer

Since you are dealing with only strings and numbers, you can use the Ord instance to remove duplicates efficiently, or even Hashable, which allows practically constant-time lookups.

Some functions which exactly match your desired signature are:

nubOrdOn from the containers package

Data.Containers.ListUtils> nubOrdOn (\f -> (fooName f, fooAge f)) foos

hashNubOn from the witherable package

Witherable> hashNubOn (\f -> (fooName f, fooAge f)) foos

You may find other options by searching on Hoogle for (a -> b) -> [a] -> [a]

If you need to do many operations like this, you may prefer to use Map or HashMap directly.

Unique elements from a list according to a subset of fields

Answers (2)

Related Questions