Finding every pronounciation permutation of a word

Question

I am attempting to write a function that generates every possible pronunciation permutation of a word using a pronunciation dictionary.

# Dictionary
sounddef <- t(matrix(strsplit('c,k,c,s,ou,uh,n,n,t,t,r,r,y,ee,w,w,o,oh,o,uh,th,th,s,s,m,m',",")[[1]], nrow = 2))

# The first column is the written letter and the second is a possible pronunciation.

match_rec <- function(x, sounddef) {
  if (!nzchar(x)) return("")
  returner <- NULL
  for (i in 1:nrow(sounddef)) {
    v <- sounddef[i,]
    char <- paste0("^",v[1])

    if (grepl(char, x)) 
      returner  <- c(returner, paste0(v[1],'->',v[2], ",", 
                                      match_rec(gsub(char, "", x), sounddef), collapse=""))
  }
  returner
}

# Unfortunately this does not return the right values
match_rec("country", sounddef)
[1] "c->k,ou->uh,n->n,t->t,r->r,y->ee,c->k,o->oh,c->k,o->uh,"
[2] "c->s,ou->uh,n->n,t->t,r->r,y->ee,c->s,o->oh,c->s,o->uh,"

The values it was supposed to return are:

[1] "c->k,ou->uh,n->n,t->t,r->r,y->ee"
[2] "c->s,ou->uh,n->n,t->t,r->r,y->ee"

As there are two possible ways to pronounce c in the dictionary.

Simon Jackson · Accepted Answer

I tackled this in a slightly different way, and added some support for corner cases like the same characters appearing multiple times, and the need to select between multiple matches (by taking the longest). Note that I used a few functions from stringr and purrr packages. I'm sure the function can be optimized, but might get you off to a start...

library(stringr)
library(purrr)

match_rec <- function(x, sound_dict) {
  if (!nzchar(x)) return("")

  # Helper variables
  key_matches <- c() # This can be optimized if number of possible matches is known
  char_keys   <- sound_dict[,1]
  unique_keys <- unique(char_keys)

  while(nzchar(x)) {
    # Find matches to beginning of string
    matches <- str_detect(x, paste0("^", unique_keys))
    if (any(matches)) {
      # Take the longest match
      char_match <- max(unique_keys[matches])
      key_matches <- c(key_matches, char_match)
      x <- str_sub(x, 1 + nchar(char_match))
    } else {
      x <- str_sub(x, 2)
    }
  }

  # Return all pronunciation permutations
  expand.grid(
    map(key_matches, ~ paste(., sound_dict[. == char_keys, 2], sep = "->"))
  )
}

Some examples of output...

sounddef <- t(matrix(strsplit('c,k,c,s,ou,uh,n,n,t,t,r,r,y,ee,w,w,o,oh,o,uh,th,th,s,s,m,m',",")[[1]], nrow = 2))

match_rec("country", sounddef)
#>   Var1   Var2 Var3 Var4 Var5  Var6
#> 1 c->k ou->uh n->n t->t r->r y->ee
#> 2 c->s ou->uh n->n t->t r->r y->ee

match_rec("counro", sounddef)
#>   Var1   Var2 Var3 Var4  Var5
#> 1 c->k ou->uh n->n r->r o->oh
#> 2 c->s ou->uh n->n r->r o->oh
#> 3 c->k ou->uh n->n r->r o->uh
#> 4 c->s ou->uh n->n r->r o->uh

match_rec("ccwouo", sounddef)
#>   Var1 Var2 Var3   Var4  Var5
#> 1 c->k c->k w->w ou->uh o->oh
#> 2 c->s c->k w->w ou->uh o->oh
#> 3 c->k c->s w->w ou->uh o->oh
#> 4 c->s c->s w->w ou->uh o->oh
#> 5 c->k c->k w->w ou->uh o->uh
#> 6 c->s c->k w->w ou->uh o->uh
#> 7 c->k c->s w->w ou->uh o->uh
#> 8 c->s c->s w->w ou->uh o->uh

match_rec("", sounddef)
#> [1] ""

Finding every pronounciation permutation of a word

Answers (2)

Related Questions