Reputation: 4056
I've been looking for a method similar to String.split in Scala Array, but I've not been able to find it.
What I want to do is to split an array by a separator.
For example, separating the following array:
val array = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
using the '\n'
separator, should result in:
List(Array(a, b), Array(c, d, e), Array(g))
I know that I can convert the Array to String, and apply split there:
array.mkString.split('\n').map(_.toArray)
but I would prefer to skip the conversion.
The solution I have so far involves using span recursively and is a bit too boilerplate:
def splitArray[T](array: Array[T], separator: T): List[Array[T]] = {
def spanRec(array: Array[T], aggResult: List[Array[T]]): List[Array[T]] = {
val (firstElement, restOfArray) = array.span(_ != separator)
if (firstElement.isEmpty) aggResult
else spanRec(restOfArray.dropWhile(_ == separator), firstElement :: aggResult)
}
spanRec(array, List()).reverse
}
I'm sure there must be something in Scala I'm missing. Any idea?
thanks, Ruben
Upvotes: 14
Views: 15788
Reputation: 12794
I came up with a solution that aims at the following:
Array
just like a Vector
, and a collection of Char
s just like a collection of arbitrary objectsArray[A]
gets split in an Array[Array[A]]
, a Vector[A]
gets split in a Vector[Vector[A]]
Iterator
)split
method on your collection)Before getting to the explanation, note that you can play with the code that follows here on Scastie.
The first step is implementing an Iterator
that chunks your collection:
import scala.language.higherKinds
import scala.collection.generic.CanBuildFrom
final class Split[A, CC[_]](delimiter: A => Boolean, as: CC[A])(
implicit view: CC[A] => Seq[A], cbf: CanBuildFrom[Nothing, A, CC[A]])
extends Iterator[CC[A]] {
private[this] var it: Iterator[A] = view(as).iterator
private def skipDelimiters() = {
it = it.dropWhile(delimiter)
}
skipDelimiters()
override def hasNext: Boolean = it.hasNext
override def next(): CC[A] = {
val builder = cbf()
builder ++= it.takeWhile(!delimiter(_))
skipDelimiters()
builder.result()
}
}
I'm using a predicate instead of a value to be more elastic in how the collection gets split, especially when splitting a collection of non-scalar values (like Char
s).
I'm using an implicit view on the collection type to be able to apply this to all kinds of collection that can be seen as a Seq
(like Vector
s and Array
s) and a CanBuildFrom
to be able to build the exact type of collection I'm receiving as an input.
The implementation of the Iterator
simply makes sure to drop delimiters and chunk the rest.
We can now use an implicit class
to offer a friendly interface and add the split
method to all the collections, both allowing a predicate or a value to be defined as delimiters:
final implicit class Splittable[A, CC[_]](val as: CC[A])(implicit ev1: CC[A] => Seq[A], ev2: CanBuildFrom[Nothing, A, CC[A]], ev3: CanBuildFrom[Nothing, CC[A], CC[CC[A]]]) {
def split(delimiter: A => Boolean): CC[CC[A]] = new Split(as)(delimiter).to[CC]
def split(delimiter: A): CC[CC[A]] = new Split(as)(_ == delimiter).to[CC]
}
Now you can use your method freely on collection of Char
s
val a = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
val b = List('\n', '\n', '\n')
val c = Vector('\n', 'c', 'd', 'e', '\n', 'g', '\n')
val d = Array('a', 'b', 'c', 'd', 'e', 'g', '\n')
val e = Array('a', 'b', 'c', 'd', 'e', 'g', '\n')
a.split('\n')
b.split('\n')
c.split('\n')
d.split('\n')
e.split('\n')
and arbitrary objects alike:
final case class N(n: Int, isDelimiter: Boolean)
Vector(N(1, false), N(2, false), N(3, true), N(4, false), N(5, false)).split(_.isDelimiter)
Note that by using the iterator directly you use a lazy approach, as you can see if you add a debug print to the next
method and try to execute the following:
new Split(Vector('\n', 'c', 'd', 'e', '\n', 'g', '\n'))(_ == '\n'}).take(1).foreach(println)
If you want, you can add a couple of methods to Splittable
that return an Iterator
, so that you can expose the lazy approach as well directly through it.
Upvotes: 1
Reputation: 61666
Almost a one-liner:
val it = array.iterator
List.range(0, array.count(_ == '\n')).map(_ => it.takeWhile(_ != '\n').toArray)
For a given array
, this makes use of an Iterator
version of the Array
in order to call .takeWhile
as many times as there are occurrences of the separator.
Another version of the same a bit shorter although less readable, using List.tabulate
which is a map over a range:
val it = array.iterator
List.tabulate(array.count(_ == '\n'))(_ => it.takeWhile(_ != '\n').toArray)
This can be made a generic equivalent of array.mkString.split("\n", -1).map(_.toArray)
by pimping Array
with:
implicit class ArrayExtensions[T: ClassTag](array: Array[T]) {
def split(sep: T): List[Array[T]] = {
val it = array.iterator
List.range(0, array.count(_ == sep)).map(_ => it.takeWhile(_ != sep).toArray)
}
}
and used this way:
Array('\n', '\n', 'a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n').split('\n')
// List(Array(), Array(), Array(a, b), Array(c, d, e), Array(g), Array())
To get rid of empty sub-arrays resulting from 2 successive occurrences of the separator, one can pipe the result with .filter(_.nonEmpty)
.
Upvotes: 0
Reputation: 1192
Simple way of doing it using foldLeft
val f = array.foldLeft((Array[Char](),List[Array[Char]]()))(
(acc, char: Char) => {
char match {
case '\n' => (Array(),acc._1 :: acc._2)
case _ => (acc._1 :+ char,acc._2)
}
}
)._2.reverse
Upvotes: 1
Reputation: 93
Pimped version of generic sequence / array split -
implicit def toDivide[A, B <% TraversableLike[A, B]](a : B) = new {
private def divide(x : B, condition: (A) => Boolean) : Iterable[B] = {
if (x.size > 0)
x.span(condition) match {
case (e, f) => if (e.size > 0) Iterable(e) ++ divide(f.drop(1),condition) else Iterable(f)
}
else
Iterable()
}
def divide(condition: (A) => Boolean): Iterable[B] = divide(a, condition)
}
Upvotes: 0
Reputation: 3872
You can also accomplish this using fold:
def splitArray[T](array: Array[T], separator: T) =
array.foldRight(List(List.empty[T])) { (c, list) =>
if (c == separator) Nil :: list
else (c :: list.head) :: list.tail
}.filter(!_.isEmpty).map(_.reverse).toArray
which was already mentioned by lambda.xy.x, but for some reason it was a bit less readable then necessary ;)
Upvotes: 0
Reputation: 4998
This is a short formulation that should do the job:
def split(array:Array[Char], sep:Char) : Array[Array[Char]] = {
/* iterate the list from right to left and recursively calculate a
pair (chars,list), where chars contains the elements encountered
since the last occurrence of sep.
*/
val (chars, list) = array.foldRight[(List[Char],List[Array[Char]])]((Nil,Nil))((x,y) => if (x == sep) (Nil, (y._1.toArray)::y._2) else (x::y._1, y._2) );
/* if the last element was sep, do nothing;
otherwise prepend the last collected chars
*/
if (chars.isEmpty)
list.toArray
else
(chars.toArray::list).toArray
}
/* example:
scala> split(array,'\n')
res26: Array[Array[Char]] = Array(Array(a, b), Array(c, d, e), Array(g), Array())
*/
If we use List instead of Array, we can generalize the code a bit:
def split[T](array:List[T], char:T) : List[List[T]] = {
val (chars, list) = array.foldRight[(List[T],List[List[T]])]((Nil,Nil))((x,y) => if (x == char) (Nil, (y._1)::y._2) else (x::y._1, y._2) )
if (chars.isEmpty) list else (chars::list)
}
/* example:
scala> split(array.toList, '\n')
res32: List[List[Char]] = List(List(a, b), List(c, d, e), List(g), List())
scala> split(((1 to 5) ++ (1 to 5)).toList, 3)
res35: List[List[Int]] = List(List(1, 2), List(4, 5, 1, 2), List(4, 5))
*/
If this solution is considered as elegant or unreadable is left to the reader and her preference for functional programming :)
Upvotes: 1
Reputation: 16245
How about this? No reflection, and not recursive either but tries to use as much of the scala library as possible.
def split[T](a: Array[T], sep: T)(implicit m:ClassManifest[T]): Array[Array[T]] = {
val is = a.indices filter (a(_) == sep)
(0 +: (is map (1+))) zip (is :+ (a.size+1)) map {
case(from,till) => a.slice(from, till)
}
}
Probably slow, but just for fun. :-)
The indices filter
gives you the indices (is
) of where your separator was found.
In your example, that's 2,6,8
. I think this is O(n)
.
The next line transforms that into (0,2), (3,6), (7,8), (9, 10)
. So k
separators yield k+1
ranges.
These are handed to slice
, which does the rest of the work. The transformation is also O(n)
where n
is the number of separators found.
(This means an input of Array[Char]()
will yield Array(Array())
and not the more intuitive Array()
but that's not too interesting).
The array appending/prepending (:+
, +:
) is wasteful using arrays, but nothing that can't be solved by using an appropriate collection that lets you have O(1)
appends/prepends.
Upvotes: 0
Reputation: 42047
You can use the span
method to split the array into two parts and then call your split method recursively on the second part.
import scala.reflect.ClassTag
def split[A](l:Array[A], a:A)(implicit act:ClassTag[Array[A]]):Array[Array[A]] = {
val (p,s) = l.span(a !=)
p +: (if (s.isEmpty) Array[Array[A]]() else split(s.tail,a))
}
This is not very efficient though, since it has quadratic performance. If you want something fast, a simple tail recursive solution will probably be the best approach.
With lists instead of arrays you would get linear performance and wouldn't need reflection.
Upvotes: 1
Reputation: 144
A borrowed the arguments from sschaef's solution:
def split[T](array : Array[T])(where : T=>Boolean) : List[Array[T]] = {
if (array.isEmpty) Nil
else {
val (head, tail) = array span {!where(_)}
head :: split(tail drop 1)(where)
}
} //> split: [T](array: Array[T])(where: T => Boolean)List[Array[T]]
val array = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
split(array){_ =='\n'} //> res2: List[Array[Char]] = List(Array(a, b), Array(c, d, e), Array(g))
def splitByNewLines(array : Array[Char]) = split(array){_ =='\n'}
splitByNewLines(array) //> res3: List[Array[Char]] = List(Array(a, b), Array(c, d, e), Array(g))
Upvotes: 1
Reputation: 12852
This is not the most concise implementation, but it should be fairly performed and preserves the array type without resorting to reflection. The loop can of course be replaced by a recursion.
Since your question doesn't explicitly state what should be done with the separator I assume, that they should not cause any entry in the output list (see the test cases below).
def splitArray[T](xs: Array[T], sep: T): List[Array[T]] = {
var (res, i) = (List[Array[T]](), 0)
while (i < xs.length) {
var j = xs.indexOf(sep, i)
if (j == -1) j = xs.length
if (j != i) res ::= xs.slice(i, j)
i = j + 1
}
res.reverse
}
Some tests:
val res1 =
// Notice the two consecutive '\n'
splitArray(Array('a', 'b', '\n', 'c', 'd', 'e', '\n', '\n', 'g', '\n'), '\n')
println(res1)
// List([C@12189646, [C@c31d6f2, [C@1c16b01f)
res1.foreach(ar => {ar foreach print; print(" ")})
// ab cde g
// No separator
val res2 = splitArray(Array('a', 'b'), '\n')
println(res2)
// List([C@3a2128d0)
res2.foreach(ar => {ar foreach print; print(" ")})
// ab
// Only separators
val res3 = splitArray(Array('\n', '\n'), '\n')
println(res3)
// List()
Upvotes: 3
Reputation: 53348
I don't know of any build-in method, but I came up with a simpler one than yours:
def splitOn[A](xs: List[A])(p: A => Boolean): List[List[A]] = xs match {
case Nil => Nil
case x :: xs =>
val (ys, zs) = xs span (!p(_))
(x :: ys) :: splitOn(zs.tail)(p)
}
// for Array
def splitOn[A : reflect.ClassTag](xs: Array[A])(p: A => Boolean): List[Array[A]] =
if (xs.isEmpty) List()
else {
val (ys, zs) = xs.tail span (!p(_))
(xs.head +: ys) :: splitOn(zs.tail)(p)
}
scala> val xs = List('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
xs: List[Char] =
List(a, b,
, c, d, e,
, g,
)
scala> splitOn(xs)(_ == '\n')
res7: List[List[Char]] = List(List(a, b), List(c, d, e), List(g))
Upvotes: 0