bigyihsuan
bigyihsuan

Reputation: 181

Golang `regexp`: Splitting a string with syllables `CV(N)(C)`

I am trying to split strings with that contain multiple forms like CV(N)(C), where

C = ptk
V = aeiou
N = mn

and elements in parentheses are optional, similar to regex ?.

My initial thought that, in regex form, this is

[ptk][aeiou][mn]?[ptk]?

So, for example, these words match, with how they're supposed to be "syllabized" (appear as separate elements in the output) on the right:

ta        => ta
tan       => tan
tank      => tank
tapa      => ta.pa
tanpa     => tan.pa
tankpa    => tank.pa
tapam     => ta.pam
tanpam    => tan.pam
tankpam   => tank.pam
tapamt    => ta.pamt
tanpamt   => tan.pamt
tankpamt  => tank.pamt
tapitetot => ta.pi.te.tot

Here is my code below:

package main
import ("strings";"regexp";"fmt")
func main() {
    words := []string{"ta","tan","tank","tapa","tanpa","tankpa","tapam","tanpam","tankpam","tapamt","tanpamt","tankpamt","tapitetot"}
    expected := []string{"ta","tan","tank","ta.pa","tan.pa","tank.pa","ta.pam","tan.pam","tank.pam","ta.pamt","tan.pamt","tank.pamt","ta.pi.te.tot"}

    C := "[ptk]"
    V := "[aeiou]"
    N := "[mn]"
    cvnc := regexp.MustCompile(fmt.Sprintf("(%s)(%s)(%s)?(%s)?", C, V, N, C))
    fmt.Println(cvnc)

    for i := range words {
        fmt.Println(words[i], "\n    expect", strings.Split(expected[i], "."), "\n    got   ", cvnc.FindAllString(words[i], -1))
    }
}

Attempt This Online!

As is, it will not syllabize the multi-syllabic test cases correctly. For example tanpa yields tanp (1 element), rather than the expected tan.pa (2 elements), and tapitetot gives tap.tet rather than ta.pi.te.tot.

How can I modify my regex so that it can syllabize these kinds of strings correctly?

If the problem statement is not clear, the problem is the (N)(C) portion of the syllable structure above; the regex greedily consumes the C if no N is found, disregarding if that C is part of another syllable. Is there a way to implement this check?

Upvotes: 1

Views: 91

Answers (0)

Related Questions