jeo.e
jeo.e

Reputation: 263

How to use web audio api to get raw pcm audio?

How usergetmedia to use the microphone in chrome and then stream to get raw audio? I need need to get the audio in linear 16.

Upvotes: 17

Views: 23839

Answers (6)

FatalFlaw
FatalFlaw

Reputation: 1163

Contrary to what many have said, it is possible to get PCM direct from the vanilla MediaRecorder, at least on Chrome:

const audioRecorder = new MediaRecorder(mediaStream, {
    mimeType: 'audio/webm;codecs=pcm'
});

Unpacking metroska/webm is a little involved. You can use ts-ebml which will decode your chunks into webm blocks. In my experience it's enough to capture the "a3" SimpleBlocks, and after removing the 4 byte header from each, the remainder of the data buffer is audio samples.

The final gotcha is that the audio samples are in 32 bit float little endian format. Here's some sample code to do the conversion from chunk to pcm:

import { Decoder } from 'ts-ebml';

...

const decoder = new Decoder();

audioRecorder.ondataavailable = async (event: BlobEvent) => {

    // webm/matroska blocks of type "a3" (SimpleBlock) contain
    // raw audio, 32 bit floats
    // https://darkcoding.net/software/reading-mediarecorders-webm-opus-output/

    let ab = await event.data.arrayBuffer()
    const els = decoder.decode(ab)
    const blks = els.filter(el => el.EBML_ID == "a3")
    // count the number of bytes (minus 4 byte header) in all "a3" blocks
    const sampleBytes = blks.reduce((acc, el) => acc + (el.dataSize - 4), 0)
    // this array will contain a contiguous array of bytes which are actually floats
    // we suffer this double copy as the blocks don't end nicely on 32 bit float boundaries
    const floatsAsBytes = new Uint8Array(sampleBytes)
    // this is the final array we will send to be consumed
    const pcm = new Uint16Array(sampleBytes / 4)
    // concatenate all the float data into one array
    let offset = 0
    blks.forEach((el) => {
      const eldata = (el as any).data.slice(4)
      floatsAsBytes.set(eldata, offset)
      offset += el.dataSize - 4
    })
    // convert the floats (via DataView) into 2's complement signed 16 bit PCM data
    const floats = new DataView(floatsAsBytes.buffer, floatsAsBytes.byteOffset, floatsAsBytes.byteLength)
    offset = 0
    for (let s = 0; s < sampleBytes; s += 4) {
      const f = floats.getFloat32(s, true)
      pcm[offset++] = (f >= 0) ? f * 0x7FFF : 0x10000 - (-f * 0x8000)
    }
    // send it to the hub
    consumeAudio(pcm.buffer);
  };
  this.audioRecorder.start(100);
}

I'm sure it wouldn't be hard to adapt the above to handle multiple channels, demuxing video, etc.

Upvotes: 1

Scott Stensland
Scott Stensland

Reputation: 28325

here is some Web Audio API where it uses the microphone to capture and playback raw audio (turn down your volume before running this page) ... to see snippets of raw audio in PCM format view the browser console ... for kicks it also sends this PCM into a call to FFT to obtain the frequency domain as well as the time domain of the audio curve

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>capture microphone then show time & frequency domain output</title>
 
<script type="text/javascript">

var webaudio_tooling_obj = function () {

    var audioContext = new AudioContext();

    console.log("audio is starting up ...");

    var BUFF_SIZE_RENDERER = 16384;
    var SIZE_SHOW = 3; // number of array elements to show in console output

    var audioInput = null,
    microphone_stream = null,
    gain_node = null,
    script_processor_node = null,
    script_processor_analysis_node = null,
    analyser_node = null;

    if (!navigator.getUserMedia)
        navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia ||
    navigator.mozGetUserMedia || navigator.msGetUserMedia;

    if (navigator.getUserMedia){

        navigator.getUserMedia({audio:true}, 
            function(stream) {
                start_microphone(stream);
            },
            function(e) {
                alert('Error capturing audio.');
            }
            );

    } else { alert('getUserMedia not supported in this browser.'); }

    // ---

    function show_some_data(given_typed_array, num_row_to_display, label) {

        var size_buffer = given_typed_array.length;
        var index = 0;

        console.log("__________ " + label);

        if (label === "time") {

            for (; index < num_row_to_display && index < size_buffer; index += 1) {

                var curr_value_time = (given_typed_array[index] / 128) - 1.0;

                console.log(curr_value_time);
            }

        } else if (label === "frequency") {

            for (; index < num_row_to_display && index < size_buffer; index += 1) {

                console.log(given_typed_array[index]);
            }

        } else {

            throw new Error("ERROR - must pass time or frequency");
        }
    }

    function process_microphone_buffer(event) {

        var i, N, inp, microphone_output_buffer;

        // not needed for basic feature set
        // microphone_output_buffer = event.inputBuffer.getChannelData(0); // just mono - 1 channel for now
    }

    function start_microphone(stream){

        gain_node = audioContext.createGain();
        gain_node.connect( audioContext.destination );

        microphone_stream = audioContext.createMediaStreamSource(stream);
        microphone_stream.connect(gain_node); 

        script_processor_node = audioContext.createScriptProcessor(BUFF_SIZE_RENDERER, 1, 1);
        script_processor_node.onaudioprocess = process_microphone_buffer;

        microphone_stream.connect(script_processor_node);

        // --- enable volume control for output speakers

        document.getElementById('volume').addEventListener('change', function() {

            var curr_volume = this.value;
            gain_node.gain.value = curr_volume;

            console.log("curr_volume ", curr_volume);
        });

        // --- setup FFT

        script_processor_analysis_node = audioContext.createScriptProcessor(2048, 1, 1);
        script_processor_analysis_node.connect(gain_node);

        analyser_node = audioContext.createAnalyser();
        analyser_node.smoothingTimeConstant = 0;
        analyser_node.fftSize = 2048;

        microphone_stream.connect(analyser_node);

        analyser_node.connect(script_processor_analysis_node);

        var buffer_length = analyser_node.frequencyBinCount;

        var array_freq_domain = new Uint8Array(buffer_length);
        var array_time_domain = new Uint8Array(buffer_length);

        console.log("buffer_length " + buffer_length);

        script_processor_analysis_node.onaudioprocess = function() {

            // get the average for the first channel
            analyser_node.getByteFrequencyData(array_freq_domain);
            analyser_node.getByteTimeDomainData(array_time_domain);

            // draw the spectrogram
            if (microphone_stream.playbackState == microphone_stream.PLAYING_STATE) {

                show_some_data(array_freq_domain, SIZE_SHOW, "frequency");
                show_some_data(array_time_domain, SIZE_SHOW, "time"); // store this to record to aggregate buffer/file
            }
        };
    }

}; //  webaudio_tooling_obj = function()

</script>

</head>
<body>

    <p>Volume</p>
    <input id="volume" type="range" min="0" max="1" step="0.1" value="0.0"/>

    <p> </p>
    <button onclick="webaudio_tooling_obj()">start audio</button>

</body>
</html>

NOTICE - before running above in your browser first turn down your volume as the code both listens to your microphone and sends real time output to the speakers so naturally you will hear feedback --- as in Jimmy Hendrix feedback

Upvotes: 2

Goddard
Goddard

Reputation: 3069

This library adds support for audio/pcm. It is basically a drop in replacement.

https://github.com/streamproc/MediaStreamRecorder

Replace the class name and then add the mimetype as pcm. Something like below.

var mediaRecorder = new MediaStreamRecorder(stream);
    mediaRecorder.mimeType = 'audio/pcm';

Upvotes: 0

imbatman
imbatman

Reputation: 518

The only two examples I've found that are clear and make sense are the following:

AWS Labs: https://github.com/awslabs/aws-lex-browser-audio-capture/blob/master/lib/worker.js

The AWS resource is very good. It shows you how to export your recorded audio to "WAV format encoded as PCM". Amazon Lex, which is a transcription service offered by AWS requires the audio to be PCM encoded and wrapped in a WAV container. You can merely adapt some of the code to make it work for you! AWS has some additional features such as "downsampling" which allows you to change the sample rate without affecting the recording.

RecordRTC: https://github.com/muaz-khan/RecordRTC/blob/master/simple-demos/raw-pcm.html

RecordRTC is a complete library. You can, once again, adapt their code or find the snippet of code that encodes the audio to raw PCM. You could also implement their library and use the code as-is. Using the "desiredSampleRate" option for audio config with this library negatively affects the recording.

They are both excellent resources and you'll definitely be able to solve your question.

Upvotes: 11

Brad
Brad

Reputation: 163548

Unfortunately, the MediaRecorder doesn't support raw PCM capture. (A sad oversight, in my opinion.) Therefore, you'll need to get the raw samples and buffer/save them yourself.

You can do this with the ScriptProcessorNode. Normally, this Node is used to modify the audio data programmatically, for custom effects and what not. But, there's no reason you can't just use it as a capture point. Untested, but try something like this code:

const captureNode = audioContext.createScriptProcessor(8192, 1, 1);
captureNode.addEventListener('audioprocess', (e) => {
  const rawLeftChannelData = inputBuffer.getChannelData(0);
  // rawLeftChannelData is now a typed array with floating point samples
});

(You can find a more complete example on MDN.)

Those floating point samples are centered on zero 0 and will ideally be bound to -1 and 1. When converting to an integer range, you'll want to clamp values to this range, clipping anything beyond it. (The values can sometimes exceed -1 and 1 in the event loud sounds are mixed together in-browser. In theory, the browser can also record float32 samples from an external sound device which may also exceed that range, but I don't know of any browser/platform that does this.)

When converting to integer, it matters if the values are signed or unsigned. If signed, for 16-bit, the range is -32768 to 32767. For unsigned, it's 0 to 65535. Figure out what format you want to use and scale the -1 to 1 values up to that range.

One final note on this conversion... endianness can matter. See also: https://stackoverflow.com/a/7870190/362536

Upvotes: 11

LegenJerry
LegenJerry

Reputation: 424

You should look into MediaTrackConstraints.sampleSize property for the MediaDevices.getUserMedia() API. Using the sampleSize constraint, if your audio hardware permits you can set the sample size to 16 bits.

As far as the implementation goes, well that's what the links are and google are for...

Upvotes: -2

Related Questions