Village
Village

Reputation: 24423

How to count the number of spoken syllables in an audio file?

I have many audio files with clean audio and only spoken voice in Mandarin Chinese. I need to estimate of how many syllables are spoken in each file. Is there a tool for OS X, Windows, or Linux that can estimate these?

sample01.wav 15
sample02.wav 8
sample03.wav 5
sample04.wav 1
sample05.wav 18

As there are many files, command-line or batch-capable software is preferred, e.g.:

$ application sample01.wav
15

Upvotes: 5

Views: 5891

Answers (4)

marsei
marsei

Reputation: 7751

The automatic segmentation of speech is an active scientific domain, meaning that there is no method that works perfectly.

In 2009, de Jong and Wempe proposed a method to automatically detect syllables in a human speech signal using Praat. This methods compares well with man-made segmentation, and has been employed in many third-party scientific studies. You can find a detailed description of the method in their scientific article (pdf), along with an historical perspective on previously proposed methods. The Praat script per se and a couple of tutorials can be found on a dedicated website (www - speechrate).

You may also be interested in another segmentation algorithm developed by Harma that has been implemented in Matlab (Harma Syllable Segmentation)

Upvotes: 13

Aditya
Aditya

Reputation: 2934

Your question requires specific attention and solution for Speech to Text. I really doubt any free open source library, easily available and serving to purpose will be served.

I have used one but for reverse purpose "text to speech". Though this is not a free library, i would love to help just Google "annosoft lipsync"...

http://www.annosoft.com/lipsync-sdks

This library is available for SDK evaluation as well....

Upvotes: 0

Navneet
Navneet

Reputation: 4823

This might be of interest for you

http://sites.google.com/site/speechrate/

Upvotes: 1

Skylion
Skylion

Reputation: 2752

You can use formants to determine this. Each syllable should correspond to a formant. Here is more information on formants:

https://en.wikipedia.org/wiki/Formants

Upvotes: 1

Related Questions