Reputation: 169
To perform an end-to-end test of an embedded platform that plays musical notes, we are trying to record via a microphone and identify whether a specific sound were played using the device' speakers. The testing setup is not a real-time system so we don't really know when (or even if) the expected sound begins and ends.
The expected sound is represented in a wave file (or similar) we can read from disk.
How can we run a test that asserts whether the sound were played as expected?
Upvotes: 0
Views: 8704
Reputation: 6259
The kind of sound you are describing, that have a well-defined duration and can be counted, is called a sound event. The task of detecting such is called Sound Event Detection (SED). Sometimes one also sees it called Audio Event Detection or Acoustic Event Detection (AED).
There are some resources for learning about it online, for example:
I would use a pretrained audio classifier as a base to extract audio embeddings, and then put a small SED model on top. A good candidate is OpenL3.
Upvotes: 1
Reputation: 556
There are a few ways to tackle this problem:
Convert the expected sound into a sequence of frequency amplitude
pairs. Then, record the sound via the microphone and convert that
recording into a corresponding sequence of frequency amplitude
pairs. Finally, compare the two sequences to see if they match.
This task can be accomplished using the modules scipy, numpy, and matplotlib.
We'll need to generate a sequence of frequency amplitude pairs for the expected sound. We can do this by using the scipy.io.wavfile.read() function to read in a wave file containing the expected sound. This function will return a tuple containing the sample rate (in samples per second) and a numpy array containing the amplitudes of the waveform. We can then use the numpy.fft.fft() function to convert the amplitudes into a sequence of frequency amplitude pairs.
We'll need to record the sound via the microphone. For this, we'll use the pyaudio module. We can create a PyAudio object using the pyaudio.PyAudio() constructor, and then use the open() method to open a stream on the microphone. We can then read in blocks of data from the stream using the read() method. Each block of data will be a numpy array containing the amplitudes of the waveform at that particular moment in time. We can then use the numpy.fft.fft() function to convert the amplitudes into a sequence of frequency amplitude pairs.
Finally, we can compare the two sequences of frequency amplitude pairs to see if they match. If they do match, then we can conclude that the expected sound was recorded correctly. If they don't match, then we can conclude that the expected sound was not recorded correctly.
Use a sound recognition system to identify the expected sound in the recording.
from pydub import AudioSegment
from pydub.silence import split_on_silence, detect_nonsilent
from pydub.playback import play
def get_sound_from_recording():
sound = AudioSegment.from_wav("recording.wav") # detect silent chunks and split recording on them
chunks = split_on_silence(sound, min_silence_len=1000, keep_silence=200) # split on silences longer than 1000ms. Anything under -16 dBFS is considered silence. keep 200ms of silence at the beginning and end
for i, chunk in enumerate(chunks):
play(chunk)
return chunks
# read in the wav file and get the sampling rate
sampling_freq, audio = wavfile.read('audio_file.wav')
# read in the reference image file
reference = plt.imread('reference_image.png')
# cross correlate the image and the audio signal
corr = signal.correlate2d(audio, reference)
# plot the cross correlation signal
plt.plot(corr)
plt.show()
This way you can set up your test to check if you are getting the correct output.
Upvotes: 4