Will
Will

Reputation: 145

Extract groups matched regex to array in scala

I got this problem. I have a

val line:String = "PE018201804527901"

that matches with this

regex : (.{2})(.{4})(.{9})(.{2})

I need to extract each group from the regex to an Array.

The result would be:

Array["PE", "0182","018045279","01"]

I try to do this regex:

val regex =  """(.{2})(.{4})(.{9})(.{2})""".r
val x= regex.findAllIn(line).toArray

but it doesn't work!

Upvotes: 6

Views: 4874

Answers (3)

Will
Will

Reputation: 145

Your solution @sheunis was very helpful, finally I resolved it with this method:

def extractFromRegex (regex: Regex, line:String): Array[String] = {
   val list =  ListBuffer[String]()
   for(m <- regex.findAllIn(line).matchData;
      e <- m.subgroups)
   list+=e
list.toArray

}

Because your solution with this code:

val line:String = """PE0182"""
val regex ="""(.{2})(.{4})""".r  
val t = regex.findAllIn(line).subgroups.toArray

Shows the next exception:

Exception in thread "main" java.lang.IllegalStateException: No match available
at java.util.regex.Matcher.start(Matcher.java:372)
at scala.util.matching.Regex$MatchIterator.start(Regex.scala:696)
at scala.util.matching.Regex$MatchData$class.group(Regex.scala:549)
at scala.util.matching.Regex$MatchIterator.group(Regex.scala:671)
at scala.util.matching.Regex$MatchData$$anonfun$subgroups$1.apply(Regex.scala:553)
at scala.util.matching.Regex$MatchData$$anonfun$subgroups$1.apply(Regex.scala:553)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at scala.util.matching.Regex$MatchData$class.subgroups(Regex.scala:553)
at scala.util.matching.Regex$MatchIterator.subgroups(Regex.scala:671)

Upvotes: 5

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626825

Note that findAllIn does not automatically anchor the regex pattern, and will find a match inside a much longer string. If you need to only allow matches inside 17 char strings, you can use a match block like this:

val line = "PE018201804527901"
val regex =  """(.{2})(.{4})(.{9})(.{2})""".r
val results = line match {
  case regex(g1, g2, g3, g4) => Array(g1, g2, g3, g4)
  case _ => Array[String]()
}
// Demo printing
results.foreach { m =>
  println(m)
} 
// PE
// 0182
// 018045279
// 01

See a Scala demo.

It also handles no match scenario well initializing an empty string array.

If you need to get all matches and all groups, then you will need to grab the groups into a list and then add the list to a list buffer (scala.collection.mutable.ListBuffer):

val line = "PE018201804527901%E018201804527901"
val regex =  """(.{2})(.{4})(.{9})(.{2})""".r
val results = ListBuffer[List[String]]()

val mi = regex.findAllIn(line)
while (mi.hasNext) {
  val d = mi.next
  results += List(mi.group(1), mi.group(2), mi.group(3), mi.group(4))
}
// Demo printing
results.foreach { m =>
  println("------")
  println(m)
  m.foreach { l => println(l) }
}

Results:

------
List(PE, 0182, 018045279, 01)
PE
0182
018045279
01
------
List(%E, 0182, 018045279, 01)
%E
0182
018045279
01

See this Scala demo

Upvotes: 5

sheunis
sheunis

Reputation: 1544

regex.findAllIn(line).subgroups.toArray

Upvotes: 8

Related Questions