Reputation: 8621

Match with arbitrary list size in Scala

Suppose I have an infinite list of stuff. In this list, I sometimes have something indicating that an hidden message is about to begin, then a message length, a crc, and then an end token. Then the list continues, and somewhere, a new message appears:

a :: b :: start :: 3 :: 1 :: 2 :: 3 :: 4FAA :: end :: x :: y :: z :: ....

What is the most idiomatic (using match, I think?) of pattern matching this to a structure like:

size = 3
payload = 1 :: 2 :: 3
crc = 4FAA

Also, take into consideration that the token "start" may appear inside the payload, so one must rely on a "full match".

Upvotes: 3

Answers (3)

Urs Peter

Reputation: 53

I think there are many different ways to solve this. Top of mind for me would be the following recursive solution:

def filterMessages(l:List[Any]):List[List[Any]] = {
    l match {
      case "start" :: tail => tail.takeWhile(_ != "end") :: filterMessages(tail.dropWhile(_ != "end"))
      case a :: tail => filterMessages(tail)
      case Nil  => Nil
    }
 }

This approach would return:

scala> val l = "a" :: "b" :: "start" :: 2 :: 1 :: 2:: "crc" :: "end" :: "a" :: "x"  ::  "start" :: 3 :: 1 :: 2 ::3 :: "crc" :: "end" :: "x" :: Nil
scala> println(filterMessages(l))
res0: List(List(2, 1, 2, crc), List(3, 1, 2, 3, crc))

If you have a 'really long' (infinite) list you should make this algorithm tail recursive. The tail-recursive solution would look like this (yields the same result as above):

import scala.annotation.tailrec
@tailrec def filterMessages(l:List[Any], result:List[List[Any]]):List[List[Any]] = {
    l match {
      case "start" :: tail => filterMessages(tail.dropWhile(_ != "end"), tail.takeWhile(_ != "end") :: result)
      case a :: tail => filterMessages(tail, result)
      case Nil  => result
    }
  }

 scala> println(filterMessages(l, Nil))
 res0: List(List(2, 1, 2, crc), List(3, 1, 2, 3, crc))

In terms of processing I would pass in a function that processes a message instead of concatenating each message in a list. This solution would look like this:

def processMessages(l: List[Any])(process: (List[Any]) => Unit): Unit = {
   l match {
      case "start" :: tail => process(tail.takeWhile(_ != "end")); processMessages(tail.dropWhile(_ != "end"))(process)
      case a :: tail => processMessages(tail)(process)
      case Nil => Nil
    }
}

Usage:

scala> processMessages(l) { println }

Upvotes: 2

Daniel C. Sobral

Reputation: 297195

Use parsers combinator. The exact solution for you seems to be parsing tokens, but, to simplify, I'll assume you're just reading a string of words separated by spaces.

object P extends scala.util.parsing.combinator.RegexParsers {
  def message: Parser[Any] = properMessage | dummy ~> message
  def properMessage = start ~> body <~ end
  def start = "(?i)start".r
  def end = "(?i)end".r
  def body = (size >> payload) ~ crc
  def crc = word
  def size = "\\d+".r ^^ (_.toInt)
  def payload = repN(_: Int, word)
  def word = "\\S+".r
  def dummy = word
}

And, using it:

scala> val stream = "a  b  start  3  1  2  3  4FAA  end  x  y  z "
stream: String = "a  b  start  3  1  2  3  4FAA  end  x  y  z "

scala> P.parse(P.message, stream)
res5: P.ParseResult[Any] = [1.35] parsed: (List(1, 2, 3)~4FAA)

Now, RegexParsers parse a stream of Char. Since you have a stream of tokens, StandardTokenParsers might be a more suited class. Or you may base it on Parsers, and define Elem to whatever fit your needs.

Upvotes: 8

mergeconflict

Reputation: 8276

The grammar you've described is a regular language. Probably best to define a custom extractor object with an unapply method that can parse your list of tokens into a list of your structures.

Upvotes: 2

Match with arbitrary list size in Scala

Answers (3)

Related Questions