Reputation: 8621
Suppose I have an infinite list of stuff. In this list, I sometimes have something indicating that an hidden message is about to begin, then a message length, a crc, and then an end token. Then the list continues, and somewhere, a new message appears:
a :: b :: start :: 3 :: 1 :: 2 :: 3 :: 4FAA :: end :: x :: y :: z :: ....
What is the most idiomatic (using match
, I think?) of pattern matching this to a structure like:
size = 3
payload = 1 :: 2 :: 3
crc = 4FAA
Also, take into consideration that the token "start" may appear inside the payload, so one must rely on a "full match".
Upvotes: 3
Views: 780
Reputation: 53
I think there are many different ways to solve this. Top of mind for me would be the following recursive solution:
def filterMessages(l:List[Any]):List[List[Any]] = {
l match {
case "start" :: tail => tail.takeWhile(_ != "end") :: filterMessages(tail.dropWhile(_ != "end"))
case a :: tail => filterMessages(tail)
case Nil => Nil
}
}
This approach would return:
scala> val l = "a" :: "b" :: "start" :: 2 :: 1 :: 2:: "crc" :: "end" :: "a" :: "x" :: "start" :: 3 :: 1 :: 2 ::3 :: "crc" :: "end" :: "x" :: Nil
scala> println(filterMessages(l))
res0: List(List(2, 1, 2, crc), List(3, 1, 2, 3, crc))
If you have a 'really long' (infinite) list you should make this algorithm tail recursive. The tail-recursive solution would look like this (yields the same result as above):
import scala.annotation.tailrec
@tailrec def filterMessages(l:List[Any], result:List[List[Any]]):List[List[Any]] = {
l match {
case "start" :: tail => filterMessages(tail.dropWhile(_ != "end"), tail.takeWhile(_ != "end") :: result)
case a :: tail => filterMessages(tail, result)
case Nil => result
}
}
scala> println(filterMessages(l, Nil))
res0: List(List(2, 1, 2, crc), List(3, 1, 2, 3, crc))
In terms of processing I would pass in a function that processes a message instead of concatenating each message in a list. This solution would look like this:
def processMessages(l: List[Any])(process: (List[Any]) => Unit): Unit = {
l match {
case "start" :: tail => process(tail.takeWhile(_ != "end")); processMessages(tail.dropWhile(_ != "end"))(process)
case a :: tail => processMessages(tail)(process)
case Nil => Nil
}
}
Usage:
scala> processMessages(l) { println }
Upvotes: 2
Reputation: 297195
Use parsers combinator. The exact solution for you seems to be parsing tokens, but, to simplify, I'll assume you're just reading a string of words separated by spaces.
object P extends scala.util.parsing.combinator.RegexParsers {
def message: Parser[Any] = properMessage | dummy ~> message
def properMessage = start ~> body <~ end
def start = "(?i)start".r
def end = "(?i)end".r
def body = (size >> payload) ~ crc
def crc = word
def size = "\\d+".r ^^ (_.toInt)
def payload = repN(_: Int, word)
def word = "\\S+".r
def dummy = word
}
And, using it:
scala> val stream = "a b start 3 1 2 3 4FAA end x y z "
stream: String = "a b start 3 1 2 3 4FAA end x y z "
scala> P.parse(P.message, stream)
res5: P.ParseResult[Any] = [1.35] parsed: (List(1, 2, 3)~4FAA)
Now, RegexParsers
parse a stream of Char
. Since you have a stream of tokens, StandardTokenParsers
might be a more suited class. Or you may base it on Parsers
, and define Elem
to whatever fit your needs.
Upvotes: 8
Reputation: 8276
The grammar you've described is a regular language. Probably best to define a custom extractor object with an unapply
method that can parse your list of tokens into a list of your structures.
Upvotes: 2