Reputation: 181
I am trying to split strings with that contain multiple forms like CV(N)(C)
, where
C = ptk
V = aeiou
N = mn
and elements in parentheses are optional, similar to regex ?
.
My initial thought that, in regex form, this is
[ptk][aeiou][mn]?[ptk]?
So, for example, these words match, with how they're supposed to be "syllabized" (appear as separate elements in the output) on the right:
ta => ta
tan => tan
tank => tank
tapa => ta.pa
tanpa => tan.pa
tankpa => tank.pa
tapam => ta.pam
tanpam => tan.pam
tankpam => tank.pam
tapamt => ta.pamt
tanpamt => tan.pamt
tankpamt => tank.pamt
tapitetot => ta.pi.te.tot
Here is my code below:
package main
import ("strings";"regexp";"fmt")
func main() {
words := []string{"ta","tan","tank","tapa","tanpa","tankpa","tapam","tanpam","tankpam","tapamt","tanpamt","tankpamt","tapitetot"}
expected := []string{"ta","tan","tank","ta.pa","tan.pa","tank.pa","ta.pam","tan.pam","tank.pam","ta.pamt","tan.pamt","tank.pamt","ta.pi.te.tot"}
C := "[ptk]"
V := "[aeiou]"
N := "[mn]"
cvnc := regexp.MustCompile(fmt.Sprintf("(%s)(%s)(%s)?(%s)?", C, V, N, C))
fmt.Println(cvnc)
for i := range words {
fmt.Println(words[i], "\n expect", strings.Split(expected[i], "."), "\n got ", cvnc.FindAllString(words[i], -1))
}
}
As is, it will not syllabize the multi-syllabic test cases correctly. For example tanpa
yields tanp
(1 element), rather than the expected tan.pa
(2 elements), and tapitetot
gives tap.tet
rather than ta.pi.te.tot
.
How can I modify my regex so that it can syllabize these kinds of strings correctly?
If the problem statement is not clear, the problem is the (N)(C)
portion of the syllable structure above; the regex greedily consumes the C
if no N
is found, disregarding if that C
is part of another syllable. Is there a way to implement this check?
Upvotes: 1
Views: 91