Luke Wood
Luke Wood

Reputation: 19

Is there a way to display an audio wave as Windows Speech Synthesizer speaks?

I'm making a program that uses Windows Speech Recognition to listen out for commands and I am using the Speech Synthesizer to provide real-time feedback. I was wondering whether it would be possible to use the result from the synthesizer to create an audio wave (similar to what you would see in something like Audacity when you record your voice), that would be displayed in real-time as the synthesizer continues to speak. I am trying to give the effect of being able to 'see' the program talk, not just hear it. I have no idea where to start and any advice/help will be greatly appreciated.

Upvotes: 0

Views: 549

Answers (1)

MrPaulch
MrPaulch

Reputation: 1418

From Windows Vista on you can capture the audio buffer of the current audio session via the:

Now WASAPI isn't great to be called by managed applications. You might need to PInvoke the funktions. But you are in luck! There is a managed library wraping that API:

It provides a number of usefull objects to play around with audio buffers and streams. You can load the package into your project via nuget

To create a Stream for capturing the live audio buffer you'd need to do something like this:

            using (WasapiCapture capture = new WasapiLoopbackCapture()) {                    
                capture.Initialize();                    
                using(MemoryStream mstr = new MemoryStream())
                using (WaveWriter wvWriter = new WaveWriter(mstr, capture.WaveFormat)) { 
                     capture.DataAvailable += 
                           (object sender, DataAvailableEventArgs e) => {                                    
                                wvWriter.Write(e.Data, e.Offset, e.ByteCount);
                                // Do some stuff with that Data!
                           }
                }
            }

To learn how to create a WaveForm of the data you pump into the stream you might want to check out some tutorials. (Hint: Ask Google)

To get you on the way, have a look at this stackoverflow question or this CodeProject article


Also note that most tutorials cover how to create wave forms of standard 44.1 kHz 16bit stereo PCM audio format. Windows likes to buffer it's audio as a 88 kHz 32bit IEEE_FLOAT stereo PCM audio format. Which means you'll have 88,000 32bit samples a second to prcoess that will correspond to 2 channels and have float values ranging from 0.0 to 1.0. (instead of -32k to +32k integer values)

Windows does this internally as, floating point samples's are better for mixing different audio sources.

Upvotes: 0

Related Questions