Reputation: 3072
I have two regular expression extractors.
One for .java files and the other is for .scala files
val JavaFileRegEx =
"""\S*
\s+
//
\s{1}
([^\.java]+)
\.java
""".replaceAll("(\\s)", "").r
val ScalaFileRegEx =
"""\S*
\s+
//
\s{1}
([^\.scala]+)
\.scala
""".replaceAll("(\\s)", "").r
I want to use these extractors above to extract a java file name and a scala file name from the example code below.
val string1 = " // Tester.java"
val string2 = " // Hello.scala"
string1 match {
case JavaFileRegEx(fileName1) => println(" Java file: " + fileName1)
case other => println(other + "--NO_MATCH")
}
string2 match {
case ScalaFileRegEx(fileName2) => println(" Scala file: " + fileName2)
case other => println(other + "--NO_MATCH")
}
I get this output indicating that the .java file matched but the .scala file did not.
Java file: Tester
// Hello.scala--NO_MATCH
How is it that the Java file matched but the .scala file did not?
Upvotes: 1
Views: 92
Reputation: 11032
NOTE
[]
denotes character class. It matches only a single character.
[^]
denotes match anything except the characters present in the character class.
In your first regex
\S*\s+//\s{1}([^\.java]+)\.java
\S*
matches nothing as there is space in starting
\s+
matches the space which is in starting
//
matches//
literally
\s{1}
matches next space
You are using [^\.java]
which says match anything except .
or j
or a
or v
or a
which can be written as [^.jav]
.
So, the left string now to be tested is
Tester.java
(Un)luckily any character from Tester
does not matches .
or j
or a
or v
until we encounter a .
. So Tester
is matched and then java
is also matched.
In your second regex
\S*\s+//\s{1}([^\.scala]+)\.scala
\S*
matches nothing as there is space in starting
\s+
matches the space which is in starting
//
matches//
literally
\s{1}
matches next space
Now, you are using [^\.scala]
which says that match anything except .
or s
or c
or a
or l
or a
which can be written as [^.scla]
.
You have now
Hello.scala
but (un)luckily Hello
here contains l
which is not allowed according to character class and the regex fails.
How to correct it?
I will modify only a bit of your regex
\S*\s+//\s{1}([^.]*)\.java
<-->
This says that match anything except .
You can also use \w here instead if [^.]
\S*\s+//\s{1}([^.]*)\.scala
There is no need of {1}
in \s{1}
. You can simply write it as \s
and it will match exactly one space like
\S*\s+//\s([^.]*)\.java
Upvotes: 1