Marzack
Marzack

Reputation: 57

Scala: Remove duplicated integers from Vector( tuples(Int,Int) , ...)

I have a big size of a vector (about 2000 elements), inside consists of many tuples, Tuple(Int,Int), i.e.

val myVectorEG = Vector((65,61), (29,49), (4,57), (12,49), (24,98), (21,52), (81,86), (91,23), (73,34), (97,41),...))

I wish to remove the repeated/duplicated integers for every tuple at the index (0), i.e. if Tuple(65,xx) repeated at other Tuple(65, yy) inside the vector, it should be removed)

I enable to access them and print out in this method:

val (id1,id2) = ( allSource.foreach(i=>println(i._1)),  allSource.foreach(i=>i._2))

How can I remove duplicate integers? Or I should use another method, rather than using foreach to access my element index at 0

Upvotes: 0

Views: 196

Answers (4)

Another option, taking advantage that you want the list sorted at the end.

def sortAndRemoveDuplicatesByFirst[A : Ordering, B](input: List[(A, B)]): List[(A, B)] = {
  import Ordering.Implicits._

  val sorted = input.sortBy(_._1)

  @annotation.tailrec
  def loop(remaining: List[(A, B)], previous: (A, B), repeated: Boolean, acc: List[(A, B)]): List[(A, B)] =
    remaining match {
      case x :: xs =>
        if (x._1 == previous._1)
          loop(remaining = xs, previous, repeated = true, acc)
        else if (!repeated)
          loop(remaining = xs, previous = x, repeated = false, previous :: acc)
        else
          loop(remaining = xs, previous = x, repeated = false, acc)

      case Nil =>
        (previous :: acc).reverse
    }

  sorted match {
    case x :: xs =>
      loop(remaining = xs, previous = x, repeated = false, acc = List.empty)

    case Nil =>
      List.empty
  }
}

Which you can test like this:

val data = List(
  1 -> "A",
  3 -> "B",
  1 -> "C",
  4 -> "D",
  3 -> "E",
  5 -> "F",
  1 -> "G",
  0 -> "H"
)

sortAndRemoveDuplicatesByFirst(data)
// res: List[(Int, String)] = List((0,H), (4,D), (5,F))

(I used List instead of Vector to make it easy and performant to write the tail-rec algorithm)

Upvotes: 1

Ivan Stanislavciuc
Ivan Stanislavciuc

Reputation: 7275

You can use a distinctBy to remove duplicates.

In the case of Vector[(Int, Int)] it will look like this

myVectorEG.distinctBy(_._1)

Updated, if you need to remove all the duplicates:

You can use groupBy but this will rearrange your order.

myVectorEG.groupBy(_._1).filter(_._2.size == 1).flatMap(_._2).toVector

Upvotes: 2

Tim
Tim

Reputation: 27421

This does the job and preserves order (unlike other solutions) but is O(n^2) so potentially slow for 2000 elements:

myVectorEG.filter(x => myVectorEG.count(_._1 == x._1) == 1)

This is more efficient for larger vectors but still preserves order:

val keep =
  myVectorEG.groupBy(_._1).collect{
    case (k, v) if v.size == 1 => k
  }.toSet

myVectorEG.filter(x => keep.contains(x._1))

Upvotes: 3

CervEd
CervEd

Reputation: 4292

To remove all duplicates, first group by the first tuple and only collect the tuples where there is only one tuple that belongs to that particular key (_._1). Then flatten the result.

myVectorEG.groupBy(_._1).collect{
  case (k, v) if v.size == 1 => v
}.flatten

This returns a List which you can call .toVector on if you need a Vector

Upvotes: 3

Related Questions