Reputation: 24423
I have many audio files with clean audio and only spoken voice in Mandarin Chinese. I need to estimate of how many syllables are spoken in each file. Is there a tool for OS X, Windows, or Linux that can estimate these?
sample01.wav 15
sample02.wav 8
sample03.wav 5
sample04.wav 1
sample05.wav 18
As there are many files, command-line or batch-capable software is preferred, e.g.:
$ application sample01.wav
15
Upvotes: 5
Views: 5891
Reputation: 7751
The automatic segmentation of speech is an active scientific domain, meaning that there is no method that works perfectly.
In 2009, de Jong and Wempe proposed a method to automatically detect syllables in a human speech signal using Praat. This methods compares well with man-made segmentation, and has been employed in many third-party scientific studies. You can find a detailed description of the method in their scientific article (pdf), along with an historical perspective on previously proposed methods. The Praat script per se and a couple of tutorials can be found on a dedicated website (www - speechrate).
You may also be interested in another segmentation algorithm developed by Harma that has been implemented in Matlab (Harma Syllable Segmentation)
Upvotes: 13
Reputation: 2934
Your question requires specific attention and solution for Speech to Text. I really doubt any free open source library, easily available and serving to purpose will be served.
I have used one but for reverse purpose "text to speech". Though this is not a free library, i would love to help just Google "annosoft lipsync"...
http://www.annosoft.com/lipsync-sdks
This library is available for SDK evaluation as well....
Upvotes: 0
Reputation: 4823
This might be of interest for you
http://sites.google.com/site/speechrate/
Upvotes: 1
Reputation: 2752
You can use formants to determine this. Each syllable should correspond to a formant. Here is more information on formants:
https://en.wikipedia.org/wiki/Formants
Upvotes: 1