djhworld
djhworld

Reputation: 6786

Splitting string into groups

I'm trying to 'group' a string into segments, I guess this example would explain it more succintly

scala> val str: String = "aaaabbcddeeeeeeffg"
... (do something)
res0: List("aaaa","bb","c","dd","eeeee","ff","g")

I can thnk of a few ways to do this in an imperative style (with vars and stepping through the string to find groups) but I was wondering if any better functional solution could be attained? I've been looking through the Scala API but there doesn't seem to be something that fits my needs.

Any help would be appreciated

Upvotes: 14

Views: 7284

Answers (9)

Xavier Guihot
Xavier Guihot

Reputation: 61774

Starting Scala 2.13, List is now provided with the unfold builder which can be combined with String::span:

List.unfold("aaaabbaaacdeeffg") {
  case ""   => None
  case rest => Some(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")

or alternatively, coupled with Scala 2.13's Option#unless builder:

List.unfold("aaaabbaaacdeeffg") {
  rest => Option.unless(rest.isEmpty)(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")

Details:

  • Unfold (def unfold[A, S](init: S)(f: (S) => Option[(A, S)]): List[A]) is based on an internal state (init) which is initialized in our case with "aaaabbaaacdeeffg".
  • For each iteration, we span (def span(p: (Char) => Boolean): (String, String)) this internal state in order to find the prefix containing the same symbol and produce a (String, String) tuple which contains the prefix and the rest of the string. span is very fortunate in this context as it produces exactly what unfold expects: a tuple containing the next element of the list and the new internal state.
  • The unfolding stops when the internal state is "" in which case we produce None as expected by unfold to exit.

Upvotes: 1

Guillem I
Guillem I

Reputation: 98

If you want to use scala API you can use the built in function for that:

str.groupBy(c => c).values

Or if you mind it being sorted and in a list:

str.groupBy(c => c).values.toList.sorted

Upvotes: 0

Thomas Jung
Thomas Jung

Reputation: 33102

You can split the string recursively with span:

def s(x : String) : List[String] = if(x.size == 0) Nil else {
    val (l,r) = x.span(_ == x(0))
    l :: s(r) 
}

Tail recursive:

@annotation.tailrec def s(x : String, y : List[String] = Nil) : List[String] = {
    if(x.size == 0) y.reverse 
    else {
        val (l,r) = x.span(_ == x(0))
        s(r, l :: y)
    }
}

Upvotes: 22

Martin Ring
Martin Ring

Reputation: 5426

def group(s: String): List[String] = s match {
  case "" => Nil
  case s  => s.takeWhile(_==s.head) :: group(s.dropWhile(_==s.head))
}

Edit: Tail recursive version:

def group(s: String, result: List[String] = Nil): List[String] = s match {
  case "" => result reverse
  case s  => group(s.dropWhile(_==s.head), s.takeWhile(_==s.head) :: result)
}

can be used just like the other because the second parameter has a default value and thus doesnt have to be supplied.

Upvotes: 13

Raphael
Raphael

Reputation: 10589

A functional* solution using fold:

def group(s : String) : Seq[String] = {
  s.tail.foldLeft(Seq(s.head.toString)) { case (carry, elem) =>
    if ( carry.last(0) == elem ) {
      carry.init :+ (carry.last + elem)
    }
    else {
      carry :+ elem.toString
    }
  }
}

There is a lot of cost hidden in all those sequence operations performed on strings (via implicit conversion). I guess the real complexity heavily depends on the kind of Seq strings are converted to.

(*) Afaik all/most operations in the collection library depend in iterators, an imho inherently unfunctional concept. But the code looks functional, at least.

Upvotes: 1

tenshi
tenshi

Reputation: 26586

Seems that all other answers are very concentrated on collection operations. But pure string + regex solution is much simpler:

str split """(?<=(\w))(?!\1)""" toList

In this regex I use positive lookbehind and negative lookahead for the captured char

Upvotes: 16

Raphael
Raphael

Reputation: 10589

Edit: Have to read more carefully. Below is no functional code.

Sometimes, a little mutable state helps:

def group(s : String) = {
  var tmp = ""
  val b = Seq.newBuilder[String]

  s.foreach { c =>
    if ( tmp != "" && tmp.head != c ) {
      b += tmp
      tmp = ""
    }

    tmp += c
  }
  b += tmp

  b.result
}

Runtime O(n) (if segments have at most constant length) and tmp.+= probably creates the most overhead. Use a string builder instead for strict runtime in O(n).

group("aaaabbcddeeeeeeffg")
> Seq[String] = List(aaaa, bb, c, dd, eeeeee, ff, g)

Upvotes: 0

Rustem Suniev
Rustem Suniev

Reputation: 1149

Make it one-liner:

scala>  val str = "aaaabbcddddeeeeefff"
str: java.lang.String = aaaabbcddddeeeeefff

scala> str.groupBy(identity).map(_._2)
res: scala.collection.immutable.Iterable[String] = List(eeeee, fff, aaaa, bb, c, dddd)

UPDATE:

As @Paul mentioned about the order here is updated version:

scala> str.groupBy(identity).toList.sortBy(_._1).map(_._2)
res: List[String] = List(aaaa, bb, c, dddd, eeeee, fff)

Upvotes: 4

Anne
Anne

Reputation: 163

You could use some helper functions like this:

val str = "aaaabbcddddeeeeefff"

def zame(chars:List[Char]) = chars.partition(_==chars.head)

def q(chars:List[Char]):List[List[Char]] = chars match {
    case Nil => Nil
    case rest =>
        val (thesame,others) = zame(rest)
        thesame :: q(others)
}

q(str.toList) map (_.mkString)

This should do the trick, right? No doubt it can be cleaned up into one-liners even further

Upvotes: 1

Related Questions