Reputation: 43491
I am using keras
and have:
corrupted_samples, corrupted_sample_rate = sf.read(
self.corrupted_audio_file_paths[index])
frequencies, times, spectrogram = scipy.signal.spectrogram(
corrupted_samples, corrupted_sample_rate)
As per the docs, this gives:
f (ndarray) - Array of sample frequencies.
t (ndarray) - Array of segment times.
Sxx (ndarray) - Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.
I assume all of the times will line up, so I don't care about the value of the time (I don't think). The same is true of frequencies
. So what I actually need is the values at each time for each frequency, which is given by Sxx
(or spectrogram
) in my code. I'm unsure how to actually do that. It seems simple though.
Upvotes: 3
Views: 2073
Reputation: 93
Based on https://towardsdatascience.com/speech-recognition-analysis-f03ff9ce78e9, the author stated that the spectrogram is a spectro-temporal representation of the sound and show some of the steps of converting wav file to spectogram.
One of the example could be as below:
## Check the sampling rate of the WAV file.
audio_file = './siren_mfcc_demo.wav'
import wave
with wave.open(audio_file, "rb") as wave_file:
sr = wave_file.getframerate()
print(sr)
audio_binary = tf.read_file(audio_file)
# tf.contrib.ffmpeg not supported on Windows, refer to issue
# https://github.com/tensorflow/tensorflow/issues/8271
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='wav', samples_per_second=sr, channel_count=1)
print(waveform.numpy().shape)
signals = tf.reshape(waveform, [1, -1])
signals.get_shape()
# Compute a [batch_size, ?, 128] tensor of fixed length, overlapping windows
# where each window overlaps the previous by 75% (frame_length - frame_step
# samples of overlap).
frames = tf.contrib.signal.frame(signals, frame_length=128, frame_step=32)
print(frames.numpy().shape)
# `magnitude_spectrograms` is a [batch_size, ?, 129] tensor of spectrograms. We
# would like to produce overlapping fixed-size spectrogram patches; for example,
# for use in a situation where a fixed size input is needed.
magnitude_spectrograms = tf.abs(tf.contrib.signal.stft(
signals, frame_length=256, frame_step=64, fft_length=256))
print(magnitude_spectrograms.numpy().shape)
The method above is referring to https://colab.research.google.com/drive/1Adcy25HYC4c9uSBDK9q5_glR246m-TSx#scrollTo=QTa1BVSOw1Oe
Hope it can help you.
Upvotes: 2