Include the splitting pattern/token from scala regex.split()

Question

The scala.util.matching.Regex appears to only have a single split() method whose behavior is to extract the match and return only the non-matching segments of the input string:

val str = "Here is some stuff PAT and second token PAT and third token PAT and fourth"
val r = "PAT".r
r.split(str)

res14: Array[String] = Array("Here is some stuff ", " and second token ", "
and third token ", " and fourth")

So is there another approach commonly used to retain the tokens in the returned list?

Note: the splitting patterns I use for actual work are somewhat complicated and certainly not constants like the above example. Therefore, simply inserting alternating constant values (or zipping them) would not suffice.

Update Here is a more representative regex

val str = "Here is some stuff PAT and second token PAT and third token 
           or something else and fourth"
val r = "(PAT|something else)".r
r.split(str)

res14: Array[String] = Array("Here is some stuff ", " and second token ", "
and third token ", " and fourth")

Wiktor Stribiżew · Accepted Answer

For a non-complicated pattern that does not involve patterns of indefinite width, you can use a lookbehind/lookahead solution:

val str = "Here is some stuff PAT and second token PAT and third token PAT and fourth"
val r = "((?<=PAT)|(?=PAT))".r
print(r.split(str).toList)

Output of the sample demo: List(Here is some stuff , PAT, and second token , PAT, and third token , PAT, and fourth)

The idea is just to match the empty strings before (?<=PAT) and after (?=PAT) the PAT pattern, and only split there. Unfortunately, there is no such a handy feature as split using a regex with a capturing group and keep the captured text as an element of the resulting array/list.

As an alternative, the matching regex with findAllIn should be used. Or temporary one-character delimiters in front or end of the delimiting pattern can be created to further split against them.

Include the splitting pattern/token from scala regex.split()

Answers (1)

Related Questions