Speech Recognition: detecting Japanese Kana (consonant + vowel)

Question

I would like to find some open source code (although I would settle for a closed source product) to convert an incoming audio stream of Japanese Kana (ie consonant+vowel pairs) and print them out pretty much in real-time.

However, I want to use these basic sound units for my own custom purpose, so I don't want any high-level processing that tries to extract genuine Japanese words. I just want to get the raw Kana.

Is anyone aware of such a technology?

I just learned today that the Japanese ' alphabet ' is basically a 10x5 grid of Kana. 10 columns ( empty + 9 consonants ) and 5 rows ( vowels )

and each element is called a 'Kana', and the language consists of sequences of these Kana; these are the basic building blocks.

This must surely have a large impact on speech recognition algorithms.

For Western languages, all commercial speech recognition engines I am aware of derive from CMUSphinx which operates on a tri-gram model: it represents each movement between three phonemes with a unique MFCC vector and figures out the most likely tri-gram sequence(s) for an utterance (from which it can deduce trivially the phonemes, and then run through its dictionary of WORD-triplets, to figure out the most likely sentence).

But for a language such as Japanese, I would guess that this may no longer be the most efficient algorithm.

Instead, it may make sense to try and catch each individual Kana, or Kana-pair.

...which is going to be 2-gram or 4-gram. but not 3!

Is there anything out there? Or do they just use the same engines the Western world does?

Speech Recognition: detecting Japanese Kana (consonant + vowel)

Answers (1)

Related Questions