Jakob Kristensen
Jakob Kristensen

Reputation: 169

Sound detection using Python

To perform an end-to-end test of an embedded platform that plays musical notes, we are trying to record via a microphone and identify whether a specific sound were played using the device' speakers. The testing setup is not a real-time system so we don't really know when (or even if) the expected sound begins and ends.

The expected sound is represented in a wave file (or similar) we can read from disk.

How can we run a test that asserts whether the sound were played as expected?

Upvotes: 0

Views: 8704

Answers (2)

Jon Nordby
Jon Nordby

Reputation: 6259

The kind of sound you are describing, that have a well-defined duration and can be counted, is called a sound event. The task of detecting such is called Sound Event Detection (SED). Sometimes one also sees it called Audio Event Detection or Acoustic Event Detection (AED).

There are some resources for learning about it online, for example:

I would use a pretrained audio classifier as a base to extract audio embeddings, and then put a small SED model on top. A good candidate is OpenL3.

Upvotes: 1

MURTUZA BORIWALA
MURTUZA BORIWALA

Reputation: 556

There are a few ways to tackle this problem:

  1. Convert the expected sound into a sequence of frequency amplitude
    pairs. Then, record the sound via the microphone and convert that
    recording into a corresponding sequence of frequency amplitude pairs. Finally, compare the two sequences to see if they match.

    1. This task can be accomplished using the modules scipy, numpy, and matplotlib.

    2. We'll need to generate a sequence of frequency amplitude pairs for the expected sound. We can do this by using the scipy.io.wavfile.read() function to read in a wave file containing the expected sound. This function will return a tuple containing the sample rate (in samples per second) and a numpy array containing the amplitudes of the waveform. We can then use the numpy.fft.fft() function to convert the amplitudes into a sequence of frequency amplitude pairs.

    3. We'll need to record the sound via the microphone. For this, we'll use the pyaudio module. We can create a PyAudio object using the pyaudio.PyAudio() constructor, and then use the open() method to open a stream on the microphone. We can then read in blocks of data from the stream using the read() method. Each block of data will be a numpy array containing the amplitudes of the waveform at that particular moment in time. We can then use the numpy.fft.fft() function to convert the amplitudes into a sequence of frequency amplitude pairs.

    4. Finally, we can compare the two sequences of frequency amplitude pairs to see if they match. If they do match, then we can conclude that the expected sound was recorded correctly. If they don't match, then we can conclude that the expected sound was not recorded correctly.

  2. Use a sound recognition system to identify the expected sound in the recording.

from pydub import AudioSegment
from pydub.silence import split_on_silence, detect_nonsilent
from pydub.playback import play

def get_sound_from_recording():
    sound = AudioSegment.from_wav("recording.wav") # detect silent chunks and split recording on them
    chunks = split_on_silence(sound,  min_silence_len=1000,  keep_silence=200) # split on silences longer than 1000ms. Anything under -16 dBFS is considered silence. keep 200ms of silence at the beginning and end
    for i, chunk in enumerate(chunks):
        play(chunk)
        return chunks
  1. Cross-correlate the recording with the expected sound. This will produce a sequence of values that indicates how closely the recording matches the expected sound. A high value at a particular time index indicates that the recording and expected sound match closely at that time.
# read in the wav file and get the sampling rate
sampling_freq, audio = wavfile.read('audio_file.wav')

# read in the reference image file
reference = plt.imread('reference_image.png')

# cross correlate the image and the audio signal
corr = signal.correlate2d(audio, reference)

# plot the cross correlation signal
plt.plot(corr)
plt.show()

This way you can set up your test to check if you are getting the correct output.

Upvotes: 4

Related Questions