Reputation: 63032
The scala.util.matching.Regex
appears to only have a single split()
method whose behavior is to extract the match and return only the non-matching segments of the input string:
val str = "Here is some stuff PAT and second token PAT and third token PAT and fourth"
val r = "PAT".r
r.split(str)
res14: Array[String] = Array("Here is some stuff ", " and second token ", "
and third token ", " and fourth")
So is there another approach commonly used to retain the tokens in the returned list?
Note: the splitting patterns I use for actual work are somewhat complicated and certainly not constants like the above example. Therefore, simply inserting alternating constant values (or zipping them) would not suffice.
Update Here is a more representative regex
val str = "Here is some stuff PAT and second token PAT and third token
or something else and fourth"
val r = "(PAT|something else)".r
r.split(str)
res14: Array[String] = Array("Here is some stuff ", " and second token ", "
and third token ", " and fourth")
Upvotes: 1
Views: 244
Reputation: 626699
For a non-complicated pattern that does not involve patterns of indefinite width, you can use a lookbehind/lookahead solution:
val str = "Here is some stuff PAT and second token PAT and third token PAT and fourth"
val r = "((?<=PAT)|(?=PAT))".r
print(r.split(str).toList)
Output of the sample demo: List(Here is some stuff , PAT, and second token , PAT, and third token , PAT, and fourth)
The idea is just to match the empty strings before (?<=PAT)
and after (?=PAT)
the PAT
pattern, and only split there. Unfortunately, there is no such a handy feature as split using a regex with a capturing group and keep the captured text as an element of the resulting array/list.
As an alternative, the matching regex with findAllIn
should be used. Or temporary one-character delimiters in front or end of the delimiting pattern can be created to further split against them.
Upvotes: 3