Reputation: 8601
I have a List[A]
, how is a idiomatic way of removing duplicates given an equality function (a:A, b:A) => Boolean
? I cannot generally override equals
for A
The way I can think now is creating a wrapping class AExt
with overridden equals
, then
list.map(new AExt(_)).distinct
But I wonder if there's a cleaner way.
Upvotes: 16
Views: 10152
Reputation: 61666
Starting Scala 2.13
, we can use the new distinctBy
method which returns elements of a sequence ignoring the duplicates as determined by ==
after applying a transforming function f
:
def distinctBy[B](f: (A) => B): List[A]
For instance:
// case class A(a: Int, b: String, c: Double)
// val list = List(A(1, "hello", 3.14), A(2, "world", 3.14), A(1, "hello", 12.3))
list.distinctBy(x => (x.a, x.b)) // List(A(1, "hello", 3.14), A(2, "world", 3.14))
list.distinctBy(_.c) // List(A(1, "hello", 3.14), A(1, "hello", 12.3))
Upvotes: 3
Reputation: 2913
Using the Foo
and customEquals
from misingFaktor's answer:
case class Foo(a: Int, b: Int)
val (a, b, c, d) = (Foo(3, 4), Foo(3, 1), Foo(2, 5), Foo(2, 5))
def customEquals(x: Foo, y: Foo) = x.a == y.a
(Seq(a, b, c, d).foldLeft(Seq[Foo]()) {
(unique, curr) => {
if (!unique.exists(customEquals(curr, _)))
curr +: unique
else
unique
}
}).reverse
If result ordering is important but the duplicate to be removed is not, then foldRight is preferable
Seq(a, b, c, d).foldRight(Seq[Foo]()) {
(curr, unique) => {
if (!unique.exists(customEquals(curr, _)))
curr +: unique
else
unique
}
}
Upvotes: 8
Reputation: 199
There is a simple (simpler) way to do this:
list.groupBy(_.key).mapValues(_.head)
If you want you can use the resulting map instantly by replacing _.head
by a function block like:
sameElements => { val observedItem = sameElements.head
new A (var1 = observedItem.firstAttr,
var2 = "SomethingElse") }
to return a new A
for each distinct element.
There is only one minor problem. The above code (list.groupBy(_.key).mapValues(_.head)
) didnt explains very well the intention to remove duplicates. For that reason it would be great to have a function like distinctIn[A](attr: A => B)
or distinctBy[A](eq: (A, A) -> Boolean)
.
Upvotes: 19
Reputation: 134270
I must say I think I'd go via an intermediate collection which was a Set
if you expected that your List
s might be quite long as testing for presence (via exists
or find
) on a Seq
is O(n) of course:
Rather than write a custom equals; decide what property the elements are equal by. So instead of:
def myCustomEqual(a1: A, a2: A) = a1.foo == a2.foo && a1.bar == a2.bar
Make a Key. Like so:
type Key = (Foo, Bar)
def key(a: A) = (a.foo, a.bar)
Then you can add the keys to a Set
to see whether you have come across them before.
var keys = Set.empty[Key]
((List.empty[A] /: as) { (l, a) =>
val k = key(a)
if (keys(k)) l else { keys += k; a +: l }
}).reverse
Of course, this solution has worse space complexity and potentially worse performance (as you are creating extra objects - the keys) in the case of very short lists. If you do not like the var
in the fold, you might like to look at how you could achieve this using State
and Traverse
from scalaz 7
Upvotes: 4
Reputation: 162
case class Foo (a: Int, b: Int)
val x = List(Foo(3,4), Foo(3,1), Foo(2,5), Foo(2,5))
def customEquals(x : Foo, y: Foo) = (x.a == y.a && x.b == y.b)
x.foldLeft(Nil : List[Foo]) {(list, item) =>
val exists = list.find(x => customEquals(item, x))
if (exists.isEmpty) item :: list
else list
}.reverse
res0: List[Foo] = List(Foo(3,4), Foo(3,1), Foo(2,5))
Upvotes: -1
Reputation: 92056
scala> case class Foo(a: Int, b: Int)
defined class Foo
scala> val (a, b, c, d) = (Foo(3, 4), Foo(3, 1), Foo(2, 5), Foo(2, 5))
a: Foo = Foo(3,4)
b: Foo = Foo(3,1)
c: Foo = Foo(2,5)
d: Foo = Foo(2,5)
scala> def customEquals(x: Foo, y: Foo) = x.a == y.a
customEquals: (x: Foo, y: Foo)Boolean
scala> Seq(a, b, c, d) filter {
| var seq = Seq.empty[Foo]
| x => {
| if(seq.exists(customEquals(x, _))) {
| false
| } else {
| seq :+= x
| true
| }
| }
res13: Seq[Foo] = List(Foo(3,4), Foo(2,5))
Upvotes: 3