rtruszk
rtruszk

Reputation: 3922

How to remove 2 or more duplicates from list and maintain their initial order?

Lets assume we have a Scala list:

val l1 = List(1, 2, 3, 1, 1, 3, 2, 5, 1)

We can easily remove duplicates using the following code:

l1.distinct

or

l1.toSet.toList

But what if we want to remove duplicates only if there are more than 2 of them? So if there are more than 2 elements with the same value we remain only two and remove the rest of them.

I could achieve it with following code:

l1.groupBy(identity).mapValues(_.take(2)).values.toList.flatten

that gave me the result:

List(2, 2, 5, 1, 1, 3, 3)

Elements are removed but the order of remaining elements is different from how these elements appeared in the initial list. How to do this operation and remain the order from original list?

So the result for l1 should be:

List(1, 2, 3, 1, 3, 2, 5)

Upvotes: 18

Views: 2670

Answers (13)

user5102379
user5102379

Reputation: 1512

I prefer approach with immutable Map:

  def noMoreThan[T](list: List[T], max: Int): List[T] = {
    def go(tail: List[T], freq: Map[T, Int]): List[T] = {
      tail match {
        case h :: t =>
          if (freq(h) < max)
            h :: go(t, freq + (h -> (freq(h) + 1)))
          else go(t, freq)
        case _ => Nil
      }
    }
    go(list, Map[T, Int]().withDefaultValue(0))
  }

Upvotes: 1

dk14
dk14

Reputation: 22374

Solution with groupBy and filter, without any sorting (so it's O(N), sorting will give you additional O(Nlog(N)) in typical case):

val li = l1.zipWithIndex
val pred = li.groupBy(_._1).flatMap(_._2.lift(1)) //1 is your "2", but - 1
for ((x, i) <- li if !pred.get(x).exists(_ < i)) yield x

Upvotes: 1

Jin
Jin

Reputation: 213

Similar to how distinct is implemeted, with a multiset instead of a set:

def noMoreThan[T](list : List[T], max : Int) = {
    val b = List.newBuilder[T]
    val seen = collection.mutable.Map[T,Int]().withDefaultValue(0)
    for (x <- list) {
      if (seen(x) < max) {
        b += x
        seen(x) += 1
      }
    }
    b.result()
  }

Upvotes: 5

phadej
phadej

Reputation: 12123

distinct is defined for SeqLike as

/** Builds a new $coll from this $coll without any duplicate elements.
 *  $willNotTerminateInf
 *
 *  @return  A new $coll which contains the first occurrence of every element of this $coll.
 */
def distinct: Repr = {
  val b = newBuilder
  val seen = mutable.HashSet[A]()
  for (x <- this) {
    if (!seen(x)) {
      b += x
      seen += x
    }
  }
  b.result()
}

We can define our function in very similar fashion:

def distinct2[A](ls: List[A]): List[A] = {
  val b = List.newBuilder[A]
  val seen1 = mutable.HashSet[A]()
  val seen2 = mutable.HashSet[A]()
  for (x <- ls) {
    if (!seen2(x)) {
      b += x
      if (!seen1(x)) {
        seen1 += x
      } else {
        seen2 += x
      }
    }
  }
  b.result()
}

scala> distinct2(l1)
res4: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

This version uses internal state, but is still pure. It is also quite easy to generalise for arbitrary n (currently 2), but specific version is more performant.

You can implement the same function with folds carrying the "what is seen once and twice" state with you. Yet the for loop and mutable state does the same job.

Upvotes: 4

Bob Dalgleish
Bob Dalgleish

Reputation: 8227

Here is canonical Scala code to do reduce three or more in a row to two in a row:

  def checkForTwo(candidate: List[Int]): List[Int] = {
    candidate match {
      case x :: y :: z :: tail if x == y && y == z =>
        checkForTwo(y :: z :: tail)
      case x :: tail => 
        x :: checkForTwo(tail)
      case Nil =>
        Nil
    }
  }

It looks at the first three elements of the list, and if they are the same, drops the first one and repeats the process. Otherwise, it passes items on through.

Upvotes: 1

Soumya Simanta
Soumya Simanta

Reputation: 11741

Not the most efficient.

scala> val l1 = List(1, 2, 3, 1, 1, 3, 2, 5, 1)
l1: List[Int] = List(1, 2, 3, 1, 1, 3, 2, 5, 1)

scala> l1.zipWithIndex.groupBy( _._1 ).map(_._2.take(2)).flatten.toList.sortBy(_._2).unzip._1
res10: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

Upvotes: 11

Jatin
Jatin

Reputation: 31724

Its a bit ugly, but its relatively faster

val l1 = List(1, 2, 3, 1, 1, 3, 2, 5, 1)
l1.foldLeft((Map[Int, Int](), List[Int]())) { case ((m, ls), x) => {
  val z = m + ((x, m.getOrElse(x, 0) + 1))
  (z, if (z(x) <= 2) x :: ls else ls)
}}._2.reverse

Gives: List(1, 2, 3, 1, 3, 2, 5)

Upvotes: 2

Dima
Dima

Reputation: 40500

How about this:

list
 .zipWithIndex
 .groupBy(_._1)
 .toSeq
 .flatMap { _._2.take(2) }       
 .sortBy(_._2)
 .map(_._1)

Upvotes: 3

The Archetypal Paul
The Archetypal Paul

Reputation: 41759

More straightforward version using foldLeft:

l1.foldLeft(List[Int]()){(acc, el) => 
     if (acc.count(_ == el) >= 2) acc else el::acc}.reverse

Upvotes: 6

Daniel Hinojosa
Daniel Hinojosa

Reputation: 992

My humble answer:

def distinctOrder[A](x:List[A]):List[A] = {
    @scala.annotation.tailrec
    def distinctOrderRec(list: List[A], covered: List[A]): List[A] = {
       (list, covered) match {
         case (Nil, _) => covered.reverse
         case (lst, c) if c.count(_ == lst.head) >= 2 => distinctOrderRec(list.tail, covered)
         case _ =>  distinctOrderRec(list.tail, list.head :: covered)
       }
    }
    distinctOrderRec(x, Nil)
}

With the results:

scala> val l1 = List(1, 2, 3, 1, 1, 3, 2, 5, 1)
l1: List[Int] = List(1, 2, 3, 1, 1, 3, 2, 5, 1)

scala> distinctOrder(l1)
res1: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

On Edit: Right before I went to bed I came up with this!

l1.foldLeft(List[Int]())((total, next) => if (total.count(_ == next) >= 2) total else total :+ next)

With an answer of:

res9: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

Upvotes: 7

Dimitri
Dimitri

Reputation: 1786

Based on experquisite's answer, but using foldLeft:

def noMoreThanBis(xs: List[Int], max: Int) = {
  val initialState: (Map[Int, Int], List[Int]) = (Map().withDefaultValue(0), Nil)
  val (_, result) = xs.foldLeft(initialState) { case ((count, res), x) =>
    if (count(x) >= max)
      (count, res)
    else
      (count.updated(x, count(x) + 1), x :: res)
  }
  result.reverse
}

Upvotes: 4

experquisite
experquisite

Reputation: 879

Not the prettiest. I look forward to seeing the other solutions.

def noMoreThan(xs: List[Int], max: Int) =
{
  def op(m: Map[Int, Int], a: Int) = {
    m updated (a, m(a) + 1)
  }
  xs.scanLeft( Map[Int,Int]().withDefaultValue(0) ) (op).tail
    .zip(xs)
    .filter{ case (m, a) => m(a) <= max }
    .map(_._2)
}

scala> noMoreThan(l1, 2)
res0: List[Int] = List(1, 2, 3, 1, 3, 2, 5)

Upvotes: 6

G&#225;bor Bakos
G&#225;bor Bakos

Reputation: 9100

Here is a recursive solution (it will stack overflow for large lists):

  def filterAfter[T](l: List[T], max: Int): List[T] = {
    require(max > 1)
    //keep the state of seen values
    val seen = Map[T, Int]().withDefaultValue(0)//init to 0
    def filterAfter(l: List[T], seen: Map[T, Int]): (List[T], Map[T, Int]) = {
      l match {
        case x :: xs =>
          if (seen(x) < max) {
            //Update the state and pass to next
            val pair = filterAfter(xs, seen updated (x, seen(x) + 1))
            (x::pair._1, pair._2)
          } else {
            //already seen more than max
            filterAfter(xs, seen)
          }
        case _ => (l, seen)//empty, terminate recursion
      }
    }
    //call inner recursive function
    filterAfter(l, seen, 2)._1
  }

Upvotes: 1

Related Questions