Sphinx4 Phoneme Segmentation

Question

I am developing a system where I need the Starting Frame, End Frame and Segmentation Score from each phoneme in a word or sentence. I've been using Sphinx-3 command: sphinx3_align, to get the following result (example):

     SFrm  EFrm   SegAScr Phone
        0    21    -67327 SIL
       22    37   -236740 AH SIL K b
       38    41    -61028 K AH S i
       42    56    -82368 S K EH i
       57    67   -106366 EH S P i
       68    86   -101908 P EH T i
       87   106    -89226 T P SIL e
      107   113    -82281 SIL
 Total score:     -827244

The problem is, I have to run this command many times, and this is consuming a lot of memory in my server. I tried passing many inputs in a control file, but this takes a lot of time to process and my application cannot have high response times.

So, in order to consume less memory maintaining the response time, I am trying to implement the same system in Sphinx-4. This way I would be able to give results back right after alignment, without having to unload the application every time it runs.

My doubt is whether it is possible to have the output indicated above (similar from sphinx3_align) in Sphinx-4?

Nikolay Shmyrev · Accepted Answer

At the current state it's not possible. This feature is not implemented.

Sphinx4 Phoneme Segmentation

Answers (1)

Related Questions