terratermatwoa
terratermatwoa

Reputation: 99

Streaming audio through WebSockets to Web Audio player

I have a somewhat working system that

  1. Produces audio on a server in to a 1 second WAV file
  2. Reads the WAV file and sends it through a websocket
  3. Websocket sends the binary data to AudioContext.decodeAudioData
  4. Decoded audio is buffered until 4 packets (4 seconds)
  5. Buffer is processed and sent to AudioBufferSourceNode.start(time) where time = (clip_count * duration)

So if I have 4 audio clips, the calls would look like

AudioBufferSourceNode.start(0);
AudioBufferSourceNode.start(1);
AudioBufferSourceNode.start(2);
AudioBufferSourceNode.start(3);

I thought this would perfectly schedule 4 seconds of audio, but I seem to be facing clock issues, perhaps because I am expecting the audio clock to be perfect. I have already used a gain node to remove clicks between each sound clip (1 second) but I start to get timing issues either right away or after a long period of time. Basically, in the worst case, my audio plays like this

 ----------------------  -----------     -----------     -----------
| 1 second | 1 second |  |   950ms |     |  900ms  |    |   850ms  |
 ----------------------  -----------     -----------     -----------
                       gap          gap              gap

In this diagram, "1 second" and "#ms" is how much audio is playing. It should always be 1 second. As the audio progresses, it seems to also develop gaps. I guess even when I tell the audio context to play a file at exactly 0, its fine, but other scheduled audio clips may or may not be on time.

Is this correct, or is there something else going wrong in my system? Is there 100% reliability that I could schedule an audio clip to play at the exact right time, or do I need to add in some calculations to figure a +/- of a few ms when to play?

Upvotes: 4

Views: 4363

Answers (1)

Zav
Zav

Reputation: 661

It looks like the thing that serves the purpose of this task is AudioWorkletNode.

According to AudioBufferSourceNode documentation:

The AudioBufferSourceNode interface is an AudioScheduledSourceNode which represents an audio source consisting of in-memory audio data, stored in an AudioBuffer. It's especially useful for playing back audio which has particularly stringent timing accuracy requirements, such as for sounds that must match a specific rhythm and can be kept in memory rather than being played from disk or the network. To play sounds which require accurate timing but must be streamed from the network or played from disk, use a AudioWorkletNode to implement its playback.

This case exactly implements streaming from the network. AudioBufferSourceNode is not designed to be updated on the fly from the network.

What can lead to desync:

  1. By the nature of the javascript scheduler, there is no guarantee to execute code at the exact time. The node might perform another job at the same time which leads to delay in sending information
  2. The timer runs next tick after sending all the data, which can take some time
  3. The client-side scheduler has even more restrictions than server-side ones. Generally, the browser can perform around 250 timers per second (one each 4ms).
  4. The used API is not designed for that flow

Recommendations:

  1. Always keep the buffer. If by some reason frames from buffer had played already it might be reasonable to request new ones faster.
  2. Increase buffer on the fly. After receiving two messages it is fine to start playing, but it might be reasonable to increase count of buffered messages on the fly to, maybe, something like 15 seconds.
  3. Prefer another tool to work with the connection and data transferring. Nginx will serve perfectly. In case the client will have a slow connection it will "hold" node till data will be transferred.
  4. In case of connection drops for a second (on the mobile network, for example) there should be something to restore state from the proper frame, update buffer and do all of that without interruptions.

Upvotes: 3

Related Questions