ixx
ixx

Reputation: 32273

Multiline regex capture in Scala

I'm trying to capture the content from a multiline regex. It doesn't match.

val text = """<p>line1 
    line2</p>"""

val regex = """(?m)<p>(.*?)</p>""".r

var result = regex.findFirstIn(text).getOrElse("")

Returns empty.

I put the m - flag for multiline but it doesn't seem to help in this case.

If I remove the line break the regex works.

I also found this but couldn't get it working.

How do I match the content between the <p> elements? I want everything between, also the line breaks.

Thanks in advance!

Upvotes: 13

Views: 6972

Answers (2)

som-snytt
som-snytt

Reputation: 39577

In case it's not obvious at this point, "How do I match the content":

scala> val regex = """(?s)<p>(.*?)</p>""".r

scala> (regex findFirstMatchIn text).get group 1
res52: String = 
line1 
    line2

More idiomatically,

scala> text match { case regex(content) => content }
res0: String =
line1
    line2

scala> val embedded = s"stuff${text}morestuff"
embedded: String =
stuff<p>line1
    line2</p>morestuff

scala> val regex = """(?s)<p>(.*?)</p>""".r.unanchored
regex: scala.util.matching.UnanchoredRegex = (?s)<p>(.*?)</p>

scala> embedded match { case regex(content) => content }
res1: String =
line1
    line2

Upvotes: 7

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

If you want to activate the dotall mode in scala, you must use (?s) instead of (?m)

(?s) means the dot can match newlines

(?m) means ^ and $ stand for begining and end of lines

Upvotes: 27

Related Questions