Johnny Everson
Johnny Everson

Reputation: 8601

Remove duplicates in List specifying equality function

I have a List[A], how is a idiomatic way of removing duplicates given an equality function (a:A, b:A) => Boolean? I cannot generally override equalsfor A

The way I can think now is creating a wrapping class AExt with overridden equals, then

list.map(new AExt(_)).distinct

But I wonder if there's a cleaner way.

Upvotes: 16

Views: 10152

Answers (6)

Xavier Guihot
Xavier Guihot

Reputation: 61666

Starting Scala 2.13, we can use the new distinctBy method which returns elements of a sequence ignoring the duplicates as determined by == after applying a transforming function f:

def distinctBy[B](f: (A) => B): List[A]

For instance:

// case class A(a: Int, b: String, c: Double)
// val list = List(A(1, "hello", 3.14), A(2, "world", 3.14), A(1, "hello", 12.3))
list.distinctBy(x => (x.a, x.b)) // List(A(1, "hello", 3.14), A(2, "world", 3.14))
list.distinctBy(_.c)             // List(A(1, "hello", 3.14), A(1, "hello", 12.3))

Upvotes: 3

drstevens
drstevens

Reputation: 2913

Using the Foo and customEquals from misingFaktor's answer:

  case class Foo(a: Int, b: Int)
  val (a, b, c, d) = (Foo(3, 4), Foo(3, 1), Foo(2, 5), Foo(2, 5))
  def customEquals(x: Foo, y: Foo) = x.a == y.a

  (Seq(a, b, c, d).foldLeft(Seq[Foo]()) {
    (unique, curr) => {
      if (!unique.exists(customEquals(curr, _)))
        curr +: unique
      else
        unique
    }
  }).reverse

If result ordering is important but the duplicate to be removed is not, then foldRight is preferable

  Seq(a, b, c, d).foldRight(Seq[Foo]()) {
    (curr, unique) => {
      if (!unique.exists(customEquals(curr, _)))
        curr +: unique
      else
        unique
    }
  }

Upvotes: 8

luno1977
luno1977

Reputation: 199

There is a simple (simpler) way to do this:

list.groupBy(_.key).mapValues(_.head)

If you want you can use the resulting map instantly by replacing _.head by a function block like:

sameElements => { val observedItem = sameElements.head
                  new A (var1 = observedItem.firstAttr,
                         var2 = "SomethingElse") }

to return a new A for each distinct element.

There is only one minor problem. The above code (list.groupBy(_.key).mapValues(_.head)) didnt explains very well the intention to remove duplicates. For that reason it would be great to have a function like distinctIn[A](attr: A => B) or distinctBy[A](eq: (A, A) -> Boolean).

Upvotes: 19

oxbow_lakes
oxbow_lakes

Reputation: 134270

I must say I think I'd go via an intermediate collection which was a Set if you expected that your Lists might be quite long as testing for presence (via exists or find) on a Seq is O(n) of course:

Rather than write a custom equals; decide what property the elements are equal by. So instead of:

def myCustomEqual(a1: A, a2: A) = a1.foo == a2.foo && a1.bar == a2.bar

Make a Key. Like so:

type Key = (Foo, Bar)
def key(a: A) = (a.foo, a.bar)

Then you can add the keys to a Set to see whether you have come across them before.

var keys = Set.empty[Key]
((List.empty[A] /: as) { (l, a) => 
  val k = key(a)
  if (keys(k)) l else { keys += k; a +: l  }
}).reverse

Of course, this solution has worse space complexity and potentially worse performance (as you are creating extra objects - the keys) in the case of very short lists. If you do not like the var in the fold, you might like to look at how you could achieve this using State and Traverse from scalaz 7

Upvotes: 4

RStrad
RStrad

Reputation: 162

case class Foo (a: Int, b: Int)

val x = List(Foo(3,4), Foo(3,1), Foo(2,5), Foo(2,5))
def customEquals(x : Foo, y: Foo) = (x.a == y.a && x.b == y.b)

x.foldLeft(Nil : List[Foo]) {(list, item) => 
   val exists = list.find(x => customEquals(item, x))
   if (exists.isEmpty) item :: list
   else list
 }.reverse

res0: List[Foo] = List(Foo(3,4), Foo(3,1), Foo(2,5))

Upvotes: -1

missingfaktor
missingfaktor

Reputation: 92056

scala> case class Foo(a: Int, b: Int)
defined class Foo

scala> val (a, b, c, d) = (Foo(3, 4), Foo(3, 1), Foo(2, 5), Foo(2, 5))
a: Foo = Foo(3,4)
b: Foo = Foo(3,1)
c: Foo = Foo(2,5)
d: Foo = Foo(2,5)

scala> def customEquals(x: Foo, y: Foo) = x.a == y.a
customEquals: (x: Foo, y: Foo)Boolean

scala> Seq(a, b, c, d) filter {
     |   var seq = Seq.empty[Foo]
     |   x => {
     |    if(seq.exists(customEquals(x, _))) {
     |      false 
     |    } else { 
     |      seq :+= x
     |      true 
     |    }
     | }
res13: Seq[Foo] = List(Foo(3,4), Foo(2,5))

Upvotes: 3

Related Questions