machine-wisdom
machine-wisdom

Reputation: 133

Scala regex and for comprehension

I am trying to reason about how for comprehension works, because it is doing something different from what I expect it to do. I read several answers, the most relevant of which is this one Scala "<-" for comprehension However, I am still perplexed.

The following code works as expected. It prints lines where the values matched by two different Regexes are not equal (one for the value in a session cookie and another for the value in the GET args, just to give context):

  file.getLines().foreach { line =>
      val whidSession: String = rWhidSession.findAllMatchIn(line) flatMap {m => m.group(1)} mkString ""
      val whidArg: String = rWhidArg.findAllMatchIn(line) flatMap {m => m.group(1)} mkString ""
      if(whidSession != whidArg) println(line)
  }

The following is the problematic code, which iterates on the letters within the matching strings, thus printing the line as many times as there are different letters in the two values:

  /**
   * This would compare letters, regardless of the use of mkString.. even without the flatMap step.
   */

  val whidTuples = for {
    line <- file.getLines().toList
    whidSession <- rWhidSession.findAllMatchIn(line) flatMap {m => m.group(1) mkString ""}
    whidArg <- rWhidEOL.findAllMatchIn(line) flatMap {m => m.group(1) mkString ""} if whidArg != whidSession 
  } yield line

Upvotes: 2

Views: 710

Answers (1)

som-snytt
som-snytt

Reputation: 39577

To check that corresponding matches are equal:

scala> val ss = "foo/foo" :: "bar/bar" :: "foo/bar" :: Nil
ss: List[String] = List(foo/foo, bar/bar, foo/bar)

scala> val ra = "(.*)/.*".r ; val rb = ".*/(.*)".r
ra: scala.util.matching.Regex = (.*)/.*
rb: scala.util.matching.Regex = .*/(.*)

scala> for (s <- ss; ra(x) = s; rb(y) = s if x != y) yield s
res0: List[String] = List(foo/bar)

but allow multiple matches on a line:

scala> val ss = "foo/foo" :: "bar/bar" :: "baz/baz foo/bar" :: Nil
ss: List[String] = List(foo/foo, bar/bar, baz/baz foo/bar)

this would still compare the first matches:

scala> val ra = """(\w*)/\w*""".r.unanchored ; val rb = """\w*/(\w*)""".r.unanchored
ra: scala.util.matching.UnanchoredRegex = (\w*)/\w*
rb: scala.util.matching.UnanchoredRegex = \w*/(\w*)

scala> for (s <- ss; ra(x) = s; rb(y) = s if x != y) yield s
res2: List[String] = List()

so compare all matches:

scala> val ra = """(\w*)/\w*""".r ; val rb = """\w*/(\w*)""".r
ra: scala.util.matching.Regex = (\w*)/\w*
rb: scala.util.matching.Regex = \w*/(\w*)

scala> for (s <- ss; ma <- ra findAllMatchIn s; mb <- rb findAllMatchIn s; ra(x) = ma; rb(y) = mb if x != y) yield s
res3: List[String] = List(baz/baz foo/bar, baz/baz foo/bar, baz/baz foo/bar)

or

scala> for (s <- ss; (ma, mb) <- (ra findAllMatchIn s) zip (rb findAllMatchIn s); ra(x) = ma; rb(y) = mb if x != y) yield s
res4: List[String] = List(baz/baz foo/bar)

scala> for (s <- ss; (ra(x), rb(y)) <- (ra findAllMatchIn s) zip (rb findAllMatchIn s) if x != y) yield s
res5: List[String] = List(baz/baz foo/bar)

where the match ra(x) = ma should not be re-evaluating the regex but just doing ma group 1.

Upvotes: 1

Related Questions