Reputation: 6094
I'm using Scala extractors (i.e.: Regex inside in a pattern mathing) in order to identify doubles and longs, like shown below.
My question is: why Regex is apparently failing when employed in a pattern matching whilst it clearly delivers the expected results when employed in a chain of if/then/else expressions?
val LONG = """^(0|-?[1-9][0-9]*)$"""
val DOUBLE = """NaN|^-?(0(\.[0-9]*)?|([1-9][0-9]*\.[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?$"""
val scalaLONG : scala.util.matching.Regex = LONG.r
val scalaDOUBLE : scala.util.matching.Regex = DOUBLE.r
val types1 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
text match {
case scalaLONG(long) => s"Long"
case scalaDOUBLE(double) => s"Double"
case _ => s"String"
})
// Results types1: Seq[String] = List("String", "Long", "String", "String", "String")
val types2 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
if(scalaDOUBLE.findFirstIn(text).isDefined) "Double" else
if(scalaLONG .findFirstIn(text).isDefined) "Long" else
"String")
// Results types2: Seq[String] = List("String", "Long", "Double", "Double", "Double")
As you can see from above, types2
delivers the expected results whilst types1
tells "String" when "Double" is expected, apparently pointing out to a failure in the Regex processing.
EDIT: With help from @alex-savitsky and @leo-c, I've arrived to the following shown below, which works as expected. However, I have to remember to provide an empty argument list in the pattern matching, otherwise it gives wrong results. This looks error prone to me.
val LONG = """^(?:0|-?[1-9][0-9]*)$"""
val DOUBLE = """^NaN|-?(?:0(?:\.[0-9]*)?|(?:[1-9][0-9]*\.[0-9]*)|(?:\.[0-9]+))(?:[Ee][+-]?[0-9]+)?$"""
val scalaLONG : scala.util.matching.Regex = LONG.r
val scalaDOUBLE : scala.util.matching.Regex = DOUBLE.r
val types1 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
text match {
case scalaLONG() => s"Long"
case scalaDOUBLE() => s"Double"
case _ => s"String"
})
// Results types1: Seq[String] = List("String", "Long", "Double", "Double", "Double")
val types2 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
if(scalaDOUBLE.findFirstIn(text).isDefined) "Double" else
if(scalaLONG .findFirstIn(text).isDefined) "Long" else
"String")
// Results types2: Seq[String] = List("String", "Long", "Double", "Double", "Double")
EDIT: OK... despite error prone... it is an extractor pattern, which employs unapply
behind the scenes and, in this case, we have to pass arguments to unnapply
. @alex-savitsky is using _*
in his edit, which explicitly enforces intention of dropping all capture groups. Looks good to me.
Upvotes: 2
Views: 199
Reputation: 2371
match
matches the whole input, while findFirstIn
can match partial input contents, sometimes resulting in more matches. In fact, findFirstIn
will ignore your boundary markings ^$
outright.
If your intention was to match the whole input, put your ^
at the beginning of the regex, as in val DOUBLE = """^NaN|-?(0(\.[0-9]*)?|([1-9][0-9]*\.[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?$"""
, then the types1
would match the types correctly.
EDIT: Here's my test case for your question
object Test extends App {
val regex = """^NaN|-?(?:0(?:\.[0-9]*)?|(?:[1-9][0-9]*\.[0-9]*)|(?:\.[0-9]+))(?:[Ee][+-]?[0-9]+)?$""".r
println(Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map {
case regex() => "Double"
case _ => "String"
})
}
results in List(String, String, Double, Double, Double)
As you see, the non-capturing groups make all the difference.
If you still want to use capturing groups, you can use _*
to ignore the capture result:
object Test extends App {
val regex = """^NaN|-?(0(\.[0-9]*)?|([1-9][0-9]*\.[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?$""".r
println(Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map {
case regex(_*) => "Double"
case _ => "String"
})
}
Upvotes: 1
Reputation: 22449
Since you defined multiple capturing groups in scalaDOUBLE, you'll need to provide matching number of arguments in your corresponding match-case, like in the following:
val types1 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
text match {
case scalaLONG(long) => s"Long"
case scalaDOUBLE(d1, d2, d3, d4, d5) => s"Double"
case _ => s"String"
})
// types1: Seq[String] = List(String, Long, Double, Double, Double)
You can examine the captured groups, as follows:
"-3.0E-05" match { case scalaDOUBLE(d1, d2, d3, d4, d5) => (d1, d2, d3, d4, d5) }
// res1: (String, String, String, String, String) = (3.0,null,3.0,null,E-05)
Upvotes: 1