jojo_Berlin
jojo_Berlin

Reputation: 693

Pattern matching extract String Scala

I want to extract part of a String that match one of the tow regex patterns i defined:

  //should match R0010, R0100,R0300 etc 
  val rPat="[R]{1}[0-9]{4}".r
  // should match P.25.01.21 , P.27.03.25 etc
  val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r 

When I now define my method to extract the elements as:

  val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
                                        case rPat(el)=>println(el) // print R0100 
                                        case _ => println("no match")}

And test it eg with:

  val pSt=" P.25.01.21 - Hello whats going on?"
  matcher(pSt)//prints "no match" but should print P.25.01.21
  val rSt= "R0010  test test 3,870" 
  matcher(rSt) //prints also "no match" but should print R0010
  //check if regex is wrong
  val pHead="P.25.01.21"
  pHead.matches(pPat.toString)//returns true
  val rHead="R0010"
  rHead.matches(rPat.toString)//return true

I'm not sure if the regex expression are wrong but the matches method works on the elements. So what is wrong with the approach?

Upvotes: 3

Views: 1325

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

When you use pattern matching with strings, you need to bear in mind that:

  • The .r pattern you pass will need to match the whole string, else, no match will be returned (the solution is to make the pattern .r.unanchored)
  • Once you make it unanchored, watch out for unwanted matches: R[0-9]{4} will match R1234 in CSR123456 (solutions are different depending on what your real requirements are, usually word boundaries \b are enough, or negative lookarounds can be used)
  • Inside a match block, the regex matching function requires a capturing group to be present if you want to get some value back (you defined it as el in pPat(el) and rPat(el).

So, I suggest the following solution:

val rPat="""\b(R\d{4})\b""".r.unanchored
val pPat="""\b(P\.\d{2}\.\d{2}\.\d{2})\b""".r.unanchored

val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
    case rPat(el)=>println(el) // print R0100 
    case _ => println("no match")
}

Then,

val pSt=" P.25.01.21 - Hello whats going on?"
matcher(pSt) // => P.25.01.21
val pSt2_bad=" CP.2334565.01124.212 - Hello whats going on?"
matcher(pSt2_bad) // => no match
val rSt= "R0010  test test 3,870" 
matcher(rSt) // => R0010
val rSt2_bad = "CSR00105  test test 3,870" 
matcher(rSt2_bad) // => no match

Some notes on the patterns:

  • \b - a leading word boundary
  • (R\d{4}) - a capturing group matching exactly 4 digits
  • \b - a trailing word boundary

Due to the triple quotes used to define the string literal, there is no need to escape the backslashes.

Upvotes: 2

If code is written in the following way, the desired outcome will be generated. Reference API documentation followed is http://www.scala-lang.org/api/2.12.1/scala/util/matching/Regex.html

  //should match R0010, R0100,R0300 etc
  val rPat="[R]{1}[0-9]{4}".r
  // should match P.25.01.21 , P.27.03.25 etc
  val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r


  def main(args: Array[String]) {
    val pSt=" P.25.01.21 - Hello whats going on?"
    val pPatMatches = pPat.findAllIn(pSt);
    pPatMatches.foreach(println)
    val rSt= "R0010  test test 3,870"
    val rPatMatches = rPat.findAllIn(rSt);
    rPatMatches.foreach(println)

  }

Please, let me know if that works for you.

Upvotes: 0

Nyavro
Nyavro

Reputation: 8866

Introduce groups in your patterns:

val rPat=".*([R]{1}[0-9]{4}).*".r

val pPat=".*([P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}).*".r 

...

scala> matcher(pSt)
P.25.01.21

scala> matcher(rSt)
R0010

Upvotes: 1

Related Questions