ovgolovin
ovgolovin

Reputation: 13410

Groupby like Python's itertools.groupby

In Python I'm able to group consecutive elements with the same key by using itertools.groupby:

>>> items = [(1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4)]
>>> import itertools
>>> list(key for key,it in itertools.groupby(items, lambda tup: tup[0]))
[1, 2, 3, 1]

Scala has groupBy as well, but it produces different result - a map pointing from key to all the values found in the iterable with the specified key (not the consecutive runs with the same key):

scala> val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
items: List[(Int, Int)] = List((1,2), (1,5), (1,3), (2,9), (3,7), (1,5), (1,4))

scala> items.groupBy {case (key, value) => key}
res0: scala.collection.immutable.Map[Int,List[(Int, Int)]] = Map(2 -> List((2,9)), 1 -> List((1,2), (1,5), (1,3), (1,5), (1,4)), 3 -> List((3,7)))

What is the most eloquent way of achieving the same as with Python itertools.groupby?

Upvotes: 3

Views: 543

Answers (6)

Pedro M Duarte
Pedro M Duarte

Reputation: 28083

Here is a simple solution that I used for a problem I stumbled on at work. In this case I didn't care too much about space, so did not worry about efficient iterators. Used an ArrayBuffer to accumulate the results.

(Don't use this with enormous amounts of data.)

Sequential GroupBy

import scala.collection.mutable.ArrayBuffer

object Main {

  /** Returns consecutive keys and groups from the iterable. */
  def sequentialGroupBy[A, K](items: Seq[A], f: A => K): ArrayBuffer[(K, ArrayBuffer[A])] = {
    val result = ArrayBuffer[(K, ArrayBuffer[A])]()
  
    if (items.nonEmpty) {
      // Iterate, keeping track of when the key changes value.
      var bufKey: K = f(items.head)
      var buf: ArrayBuffer[A] = ArrayBuffer()
  
      for (elem <- items) {
        val key = f(elem)
  
        if (key == bufKey) {
          buf += elem
        } else {
          val group: (K, ArrayBuffer[A]) = (bufKey, buf)
          result += group
          bufKey = key
          buf = ArrayBuffer(elem)
        }
      }
  
      // Append last group.
      val group: (K, ArrayBuffer[A]) = (bufKey, buf)
      result += group
    }
    result
  }
  
  def main(args: Array[String]): Unit = {
    println("\nExample 1:")
    sequentialGroupBy[Int, Int](
      Seq(1, 4, 5, 7, 9, 8, 16), 
      x => x % 2
    ).foreach(println)

    println("\nExample 2:")
    sequentialGroupBy[String, Boolean](
      Seq("pi", "nu", "rho", "alpha", "xi"), 
      x => x.length > 2
    ).foreach(println)
  }
}

Running the above code results in the following:

Example 1:
(1,ArrayBuffer(1))
(0,ArrayBuffer(4))
(1,ArrayBuffer(5, 7, 9))
(0,ArrayBuffer(8, 16))

Example 2:
(false,ArrayBuffer(pi, nu))
(true,ArrayBuffer(rho, alpha))
(false,ArrayBuffer(xi))

Upvotes: 0

igx
igx

Reputation: 4231

hmm couldn't find something out of the box but this will do it

def groupz[T](list:List[T]):List[T] = {
      list match {
      case Nil => Nil
      case x::Nil => List(x)
      case x::xs if (x == xs.head) => groupz(xs)
      case x::xs => x::groupz(xs)
      }}

//now let's add this functionality to List class 
 implicit def addPythonicGroupToList[T](list:List[T]) = new {def pythonGroup = groupz(list)}

and now you can do:

val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
items.map(_._1).pythonGroup
res1: List[Int] = List(1, 2, 3, 1)

Upvotes: 1

elm
elm

Reputation: 20415

Using List.span, like this

def keyMultiSpan(l: List[(Int,Int)]): List[List[(Int,Int)]] = l match {

  case Nil => List()
  case h :: t =>
    val ms = l.span(_._1 == h._1)
    ms._1 :: keyMultiSpan(ms._2)
}

Hence let

val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))

and so

keyMultiSpan(items).map { _.head._1 }
res: List(1, 2, 3, 1)

Update

A more readable syntax, as suggested by @Paul, an implicit class for possibly neater usage, and type parameterisation for generality,

implicit class RichSpan[A,B](val l: List[(A,B)]) extends AnyVal {

  def keyMultiSpan(): List[List[(A,B)]] = l match {

      case Nil => List()
      case h :: t =>
        val (f, r) = l.span(_._1 == h._1)
        f :: r.keyMultiSpan()
  }
}

Thus, use it as follows,

items.keyMultiSpan.map { _.head._1 }
res: List(1, 2, 3, 1)

Upvotes: 2

Rex Kerr
Rex Kerr

Reputation: 167871

If you just want to throw out sequential duplicates, you can do something like this:

def unchain[A](items: Seq[A]) = if (items.isEmpty) items else {
  items.head +: (items zip items.drop(1)).collect{ case (l,r) if r != l => r }
}

That is, just compare the list to a version of itself shifted by one place, and only keep the items which are different. It's easy to add a (same: (a1: A, a2: A) => Boolean) parameter to the method and use !same(l,r) if you want custom behavior for what counts as the same (e.g. do it just by key).

If you want to keep the duplicates, you can use Scala's groupBy to get a very compact (but inefficient) solution:

def groupSequential(items: Seq[A])(same: (a1: A, a2: A) => Boolean) = {
  val ns = (items zip items.drop(1)).
    scanLeft(0){ (n,cc) => if (same(cc._1, cc._2)) n+1 else n }
  (ns zip items).groupBy(_._1).toSeq.sortBy(_._1).map(_._2)
}

Upvotes: 2

Jean Logeart
Jean Logeart

Reputation: 53809

Try:

val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
val res = compress(items.map(_._1))

/** Eliminate consecutive duplicates of list elements **/
def compress[T](l : List[T]) : List[T] = l match {
  case head :: next :: tail if (head == next) => compress(next :: tail)
  case head :: tail => head :: compress(tail)
  case Nil => List()
}

/** Tail recursive version **/
def compress[T](input: List[T]): List[T] = {
  def comp(remaining: List[T], l: List[T], last: Any): List[T] = {
    remaining match {
      case Nil => l
      case head :: tail if head == last => comp(tail, l, head)
      case head :: tail => comp(tail, head :: l, head)
    }
  }
  comp(input, Nil, Nil).reverse
}

Where compress is the solution of one of the 99 Problems in Scala.

Upvotes: 1

wingedsubmariner
wingedsubmariner

Reputation: 13667

Here is a succinct but inefficient solution:

def pythonGroupBy[T, U](items: Seq[T])(f: T => U): List[List[T]] = {
  items.foldLeft(List[List[T]]()) {
    case (Nil, x) => List(List(x))
    case (g :: gs, x) if f(g.head) == f(x) => (x :: g) :: gs
    case (gs, x) => List(x) :: gs
  }.map(_.reverse).reverse
}

And here is a better one, that only invokes f on each element once:

def pythonGroupBy2[T, U](items: Seq[T])(f: T => U): List[List[T]] = {
  if (items.isEmpty)
    List(List())
  else {
    val state = (List(List(items.head)), f(items.head))
    items.tail.foldLeft(state) { (state, x) =>
      val groupByX = f(x)
      state match {
        case (g :: gs, groupBy) if groupBy == groupByX => ((x :: g) :: gs, groupBy)
        case (gs, _) => (List(x) :: gs, groupByX)
      }
    }._1.map(_.reverse).reverse
  }
}

Both solutions fold over items, building up a list of groups as they go. pythonGroupBy2 also keeps track of the value of f for the current group. At the end, we have to reverse each group and the list of groups in order to get the correct order.

Upvotes: 1

Related Questions