blue-sky
blue-sky

Reputation: 53826

List.distinct not removing duplicates of class with equals override

Using below code I'm attempting to remove duplicate elements using the distinct method on List :

class OrderDetSpecific(var size: Double,
                       var side: String,
                       var trade_id: Int,
                       var price: Double,
                       var time: String,
                       var code : String) {

  override def toString : String = {
    this.code+","+this.time+","+this.trade_id+","+this.price
  }

}


val l = List(new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"),new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"))

println(l.size)
println(l.distinct.size)

returns :

defined class OrderDetSpecific

l: List[OrderDetSpecific] = List(a,10,1,1.0, a,10,1,1.0)

2
2

But as can see the duplicated elements are not being removed. Overriding the toString method is utilized as a part of discovering duplicates and as the duplicates entries exist then the List l should be of size 1 instead of 2 as a result of calling l.distinct.size?

Update :

Converting to case class :

case class OrderDetSpecific(var size: Double,
                       var side: String,
                       var trade_id: Int,
                       var price: Double,
                       var time: String,
                       var code : String) {

  override def toString : String = {
    this.code+","+this.time+","+this.trade_id+","+this.price
  }

}


val l = List(new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"),new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"))

println(l.size)
println(l.distinct.size)

now the duplicated elements are removed. Does a case class use value equality on equals which allows distinct to behave as expected ?

When I override equals :

class OrderDetSpecific(var size: Double,
                       var side: String,
                       var trade_id: Int,
                       var price: Double,
                       var time: String,
                       var code : String) {

  override def toString : String = {
    this.code+","+this.time+","+this.trade_id+","+this.price
  }

  override def equals(that: Any): Boolean =
    that match {
      case that: OrderDetSpecific => {
        time == that.time
      }
      case _ => false
    }

}


val l = List(new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"),new OrderDetSpecific(1.0 , "1" , 1 , 1.0 , "10" , "a"))

println(l.size)
println(l.distinct.size)

The distinct element is not removed. As my override is on time attribute which is not distinct shouldnt the duplicated element be removed ?

Upvotes: 1

Views: 680

Answers (1)

Mario Galic
Mario Galic

Reputation: 48420

As my override is on time attribute which is not distinct shouldnt the duplicated element be removed ?

No, because distinct is equivalent to distinctBy(identity) and distinctBy uses HashSet which uses hashCode to eliminate duplicates, however you have not provided an override for hashCode. For example, without hashCode override

class Foo(var x: Int) {
  override def equals(obj: Any): Boolean = true
}

val a = new Foo(42)
val b = new Foo(42)

a.## == b.##
mutable.HashSet(a, b).size == 1

outputs

res0: Boolean = false
res1: Boolean = false

whilst with hashCode override provided

class Foo(var x: Int) {
  override def equals(obj: Any): Boolean = true
  override def hashCode(): Int = scala.runtime.Statics.anyHash(x)
}
...

we get

res0: Boolean = true
res1: Boolean = true

However, there is no need to fiddle with these overrides, instead try

l.distinctBy(_.time)

which outputs

res0: List[OrderDetSpecific] = List(a,10,1,1.0)

Upvotes: 4

Related Questions