Reputation: 8821
I will like to implement a simple Wiki-like mark up parser as a exercise of using Scala parser combinator.
I would like to solve this bit by bit, so here is what I would like to achieve in the first version: a simple inline literal markup.
For example, if the input string is:
This is a sytax test ``code here`` . Hello ``World``
The output string should be:
This is a sytax test <code>code here</code> . Hello <code>World</code>
I try to solve this by using RegexParsers
, and here is what I've done now:
import scala.util.parsing.combinator._
import scala.util.parsing.input._
object TestParser extends RegexParsers
{
override val skipWhitespace = false
def toHTML(s: String) = "<code>" + s.drop(2).dropRight(2) + "</code>"
val words = """(.)""".r
val literal = """\B``(.)*``\B""".r ^^ toHTML
val markup = (literal | words)*
def run(s: String) = parseAll(markup, s) match {
case Success(xs, next) => xs.mkString
case _ => "fail"
}
}
println (TestParser.run("This is a sytax test ``code here`` . Hello ``World``"))
In this code, a simpler input which only contains one <code>
markup works fine, for example:
This is a sytax test ``code here``.
become
This is a sytax test <code>code here</code>.
But when I run it with above example, it will yield
This is a sytax test <code>code here`` . Hello ``World</code>
I think this is because the regex I use:
"""\B``(.)*``\B""".r
allowed any characters in ``
pairs.
I would like to know know should I limit there could not have nested ``
and fix this problem?
Upvotes: 0
Views: 156
Reputation: 19761
Here's some docs on non-greedy matching:
http://www.exampledepot.com/egs/java.util.regex/Greedy.html
Basically it's starting at the first `` and going as far as it can to get a match, which matches the `` at the end of world.
By putting a ? after your *, you tell it to do the shortest match possible, instead of the longest match.
Another option is to use [^`]* (anything EXCEPT `), and that will force it to stop earlier.
Upvotes: 2
Reputation: 51109
I don't know much about regex parsers, but you can use a simple 1-liner:
def addTags(s: String) =
"""(``.*?``)""".r replaceAllIn (
s, m => "<code>" + m.group(0).replace("``", "") + "</code>")
Test:
scala> addTags("This is a sytax test ``code here`` . Hello ``World``")
res0: String = This is a sytax test <code>code here</code> . Hello <code>World</code>
Upvotes: 0
Reputation: 8821
After some trial and error, I found the following regex seems work:
"""``(.)*?``"""
Upvotes: 0