Will I Am
Will I Am

Reputation: 2672

Scala - byte array of UTF8 strings

I have a byte array (or more precisely a ByteString) of UTF8 strings, which are prefixed by their length as 2-bytes (msb, lsb). For example:

val z = akka.util.ByteString(0, 3, 'A', 'B', 'C', 0, 5, 
        'D', 'E', 'F', 'G', 'H',0,1,'I')

I would like to convert this to a list of strings, so it should similar to List("ABC", "DEFGH", "I").

Is there an elegant way to do this?

(EDIT) These strings are NOT null terminated, the 0 you are seeing in the array is just the MSB. If the strings were long enough, the MSB would be greater than zero.

Upvotes: 1

Views: 3499

Answers (3)

Lionel Port
Lionel Port

Reputation: 3542

Edit: Updated based on clarification in comments that first 2 bytes define an int. So I converted it manually.

def convert(bs: List[Byte]) : List[String] = {
  bs match {
    case count_b1 :: count_b2 :: t =>
      val count =  ((count_b1 & 0xff) << 8) | (count_b2 & 0xff)
      val (chars, leftover) = t.splitAt(count)
      new String(chars.toArray, "UTF-8") :: convert(leftover)
    case _ => List()
  }
}

Call convert(z.toList)

Upvotes: 1

suztomo
suztomo

Reputation: 5202

Here is my answer with foldLeft.

def convert(z : ByteString) = z.foldLeft((List() : List[String], ByteString(), 0, 0))((p, b : Byte) => {
  p._3 match {
    case 0 if p._2.nonEmpty => (p._2.utf8String :: p._1, ByteString(), -1, b.toInt)
    case 0 => (p._1, p._2, -1, b.toInt)
    case -1 => (p._1, p._2, (p._4 << 8) + b.toInt, 0)
    case _ => (p._1, p._2 :+ b, p._3 - 1, 0)
  }
})

It works like this:

scala> val bs = ByteString(0, 3, 'A', 'B', 'C', 0, 5,  'D', 'E', 'F', 'G', 'H',0,1,'I')
scala>   val k = convert(bs); (k._2.utf8String :: k._1).reverse
k: (List[String], akka.util.ByteString, Int, Int) = (List(DEFGH, ABC),ByteString(73),0,0)
res20: List[String] = List(ABC, DEFGH, I)

Upvotes: 0

elm
elm

Reputation: 20405

Consider multiSpan method as defined here which is a repeated application of span over a given list,

z.multiSpan(_ == 0).map( _.drop(2).map(_.toChar).mkString )

Here the spanning condition is whether an item equals 0, then we drop the first two prefixing bytes, and convert the remaining to a String.

Note On using multiSpan, recall to import annotation.tailrec .

Upvotes: 0

Related Questions